After a brief jaunt through code and a think, I think I would summarize my reluctance to try to solve the client-determined identifier need for groups using pubid
as follows. This doesn’t touch on the authorization or invite bits involved with pubid
, just some high-level things.
I don’t know if this is deep enough to claim that I’ve looked at how hard it would be to update pubid
in all the various places, but I realize my argument stands even if group.pubid
were easy to modify. I feel like it inherently does something else than the notion we’re after here. Here are my thoughts!
- One of the current core things about
pubid
s is that they are known to be unique for a resource across all of our data. This is reflected in their use to identify a single resource in 17 different routes in our application at present. - However, our need for a client-determined identifier needs to enforce uniqueness per authority — the same identifier could be reused in multiple authorities. I believe this to be a design necessity, alert me if I’m wrong.
- If we changed
pubid
to be unique-per-authority, we’d have a bunch of routes that no longer point to unique resources, but possibly one URl that could represent multiple resources. Uh oh! organization
also makes use of apubid
. If we changed the waypubid
worked forgroups
(vis-a-vis their uniqueness or composition),organization
andgroup
would be out of sync, which introduces some complexity—organization
pubid
s could be relied upon to point to a single resource, but notgroup
pubid
s. Uh oh!- Introducing a
groupname
field would be concordant with the way identifiers work forusers
(username
), and could be brought into yet more harmony by the adoption of agroupid
syntax in line with the way we computeuserid
s
We have this system, roughly, which can be applied to various resources:
- An internal unique ID, never exposed (i.e. an auto-increment)
- A service-generated, web-facing, guaranteed-unique
pubid
. Thispubid
has some extra meaning WRTgroup
right now (having possession of it gives you powers). In the future, I would suggest this should perhaps not be the case, but I’m going to put that aside for now. - A client-determined, unique-per-authority identifier (
user.username
,;proposedgroup.groupname
). The client/authority owns managing these for uniqueness. - Thus: There is a clear line between
pubid
(owned/managed by the service) and*name
(owned/managed by the authority’s client(s)). - A computed syntax that can be used to form a service-wide unique identifier by combining
*name
with theauthority
, e.g. the way we computeuserid
(account:<username>@<authority>
). We could also have agroupid
syntax.
Extending this a little into the future (fuel for ruminating, not immediate):
user
s don’t havepubid
s. Perhaps they should? Could be useful and consistent. Discuss? :)- If our resources had a consistent combination of these, resource endpoints could ostensibly accept either a
pubid
or a*id
(userid
,groupid
, etc.) syntax…
I’m starting to realize that there is an inherent incompatibility between all of the following requirements in play at once:
- Have a client-definable, unique-to-authority identifier for a resource AND
- That identifier must be mutable AND
- We want an UPSERT endpoint that uses that mutable identifier as the, well, identifier for a resource
My hunch is that the first two requirements are necessary. The first is relatively self-explanatory (and how we got here in the first place) and the second follows naturally, especially as—as Rob pointed out, thank you—the username
field on user
is mutable, and this identifier is supposed to be client-owned—thus the client should be in control and able to mutate that property’s value.
However, the third requirement arises from a desire to limit requests made from the LMS app—UPSERT
operations are not actually RESTful and trying to impose them here is starting to show some fracture lines. Erp.
My earlier proposal of an endpoint in the style of PUT /api/groups/{groupid}
where groupid
is a userid
-like construct, e.g. ”group:[email protected]”
is not sound, as the somegroupidentifier
could be mutated—and even potentially in that very request. Yikes. That is not good.
(Note: I’m going to call the client-owned ID groupname
and the service-owned ID pubid
in the following but that doesn’t imply those names are baked; I’m also using an invention of groupid
as defined above, similar to userid
, but that’s provisional, of course, too).
- Create a new group:
POST /api/groups
(with agroupname
in the request body); no change here; - Retrieve a group:
GET /api/groups/{pubid}
ORGET /api/groups?groupname=somegroupidentifier&authority=myauthority.org
ORGET /api/groups?groupid=group:[email protected]
- Update a group:
PATCH /api/groups/{pubid}
- Totally legal to update the group’s
groupname
in this request
- Totally legal to update the group’s
In this model, we definitely need to retain something pubid
-like. We need that truly unique, system-wide identifier. It needs to be something that cannot be changed by the client.
And in this model, the pubid
is the unique reference to the resource; thus the modeling of retrieving-by-groupname-or-groupid is really a search function.
With respect to groups, on launch:
- Attempt to retrieve the group using the client-determined identifier:
GET api/[email protected]
- If 404 response, the group doesn’t exist. If current LTI user is an instructor:
POST /api/groups
with request body containing desired identifier (i.e. create the group)
- If 404 response, the group doesn’t exist. If current LTI user is an instructor:
- At this point, we should have a
200
response with the group resource in it—which includespubid
PATCH /api/groups/{pubid}
with any fields you want to change in the body (or the same request body as group-create, if you want). In any case, aPATCH
Yes, I concede (sorry!): this is 2 requests related to groups on app launch (possible max of 3 if group doesn’t exist yet), but I think the API design may be more correct. It also gives the LMS app an opportunity to decide whether it wishes to perform metadata sync (i.e. updating) of resources on every launch or just sometimes/less frequently. It could also allow for a situation in which the requesting client wants to verify that the resource it retrieves is indeed the one it intended to retrieve before blindly updating it.
- I do believe we still need a distinction between a service-owned, non-mutable, unique reference (currently
pubid
) and a client-defined, authority-specific identifier (*name
or whatever). I don’t see an easy way to retirepubid
s entirely, without replacing them with something else in this (unique ID) vein. - I’ve banged my head against models for an
UPSERT
operation for a couple of months now and I keep coming to RESTful dead ends. Sorry! - In this model, technically,
groupname
is not an ID so much as just-a-regular field that can be used as an identifier in some use cases. See also:username
. So our IDs are stillpubid
s. This is important to note: we wouldn’t be introducing a new ID concept here.
This proposed direction satisfies the needs to not store pubid
s in the LMS database and the ability for the client to define some form of unique identifier, but does not implement UPSERT.
I've organised my responses into a very long comment!!
Executive summary:
I think the upshot of all this mostly matches Lyza's proposal except I'm suggesting
Group.provider_unique_id
instead ofGroup.groupname
? (And I think it should be immutable and an upsert API would be better, but these may be less crucial differences.)For clarity what I think we should do is:
Add optional, per-authority-unique
Group.provider_unique_id
field (new column on group table)Where in the LMS app's case
<PROVIDER_UNIQUE_ID>
will be a SHA-1 of a couple of LTI launch params that globally uniquely identify the LMS course.This API only needs to be available to authclients, in fact only the LMS app, for now, and may not ever need to be a public API. (And doesn't preclude us from having a different public API for groups.)
Arguments and details and responses:
Provider IDs aren't like pubids
Lyza's gist makes this point as well but I just wanted to double make it. It's been commented that there might not be a good reason to have two separate group IDs (the existing pubids plus some kind of new, caller-provided ID) in the long-term. I've always wanted to add a new, second type of ID to groups because that's going to be quickest and easiest and I think the requirements and constraints on pubids vs the provider IDs that're required by the LMS app and other third-party integrations are quite different and incompatible. Hopefully this table gets that across:
Some notes on these
I think we all wish that group pubids being secret and unguessable was not the basis of privacy for private groups. But it is, and will be so for the foreseeable future. Replacing the group invite system is not seriously on the cards right now. So we do need to reckon with this.
I also understand that you actually can't join LMS groups using their pubid / group invite page because LMS groups are third-party so the group invite page doesn't work for them. But getting group pages (and activity pages generally) working for LMS and third-party accounts is something we hear about all the time, and the LMS are groups are literally private groups (the same as first-party private groups) in the DB. I don't want to have one kind of private group (first-party ones) for which the pubid can be used for joining and another kind of private group (third-party ones) for which it can't, and place another major roadblock in our way if and when we come to making group pages work for third-party groups.
Pubids have to be server-generated because they have to be unguessable because they can be used to join the group. That's why they're generated randomly. So they can't be caller-provided. (This was the main requirement when we designed pubids -- they were not just meant as URL-friendly IDs.)
As Lyza has pointed out pubids are globally unique whereas we only want provider IDs to have to be within-authority unique because we don't want different providers to have to worry about colliding with eachother's IDs.
I also think having a short, URL-friendly, globally unique ID that is guaranteed to exist for every group is nice and potentially useful. It's nice for example that this can be used to form a nice, short unique URL (
/groups/xyz123
), or display a unique ID for the group, without having to include the authority in it.We already have two different IDs for users
The idea that we should do everything with just one kind of ID, instead of having different IDs for different purposes,
doesn't follow existing architecture in h: a user in h has a username and also zero-or-more
provider_unique_id
s.This was done for good reason as the requirements and constraints on usernames are quite different from those on provider
unique IDs.
Provider IDs aren't like usernames either
Lyza's gist likens caller-provided group IDs to usernames and calls them groupnames. I don't think the needed group IDs are similar to usernames:
Usernames are required, whereas provider unique IDs should be optional (non-LMS groups don't need them)
Usernames are displayed in the UI and used in URLs, so they have to be short and human readable. For provider IDs it's
actually undesirable for them to want to be short, human-readable or URL friendly, as this limits what existing IDs the
provider can use for them, and makes generating IDs harder for the provider.
Provider IDs are opaque IDs, not names.
The group IDs that the LMS app is going to create are going to be SHA-1's of the
tool_consumer_instance_guid
andcontext_id
. These are IDs, they're not "names". This applies to future similar integrations too. Whenever an integration wants to take some outside object (in the LMS app's case a Canvas course) and create a group for it, it's going to be very useful if the integrator can take or generate some kind of ID for the outside object (any kind of digest, or UUID, etc) and use that as the ID for the Hypothesis group. This makes integrating a lot easier.We don't want to make it more difficult for integrators by calling it a "name" and thereby encouraging them to think that it should be human readable and display friendly, might appear in URLs or in the UI, might want to be limited to a certain character set or be short, etc.
We want to encourage them to say "just generate an opaque ID or use any kind of existing ID you have".
We want to encourage integrators to use the kind of opaque and not human or particularly URL friendly strings that come out of standard universal ID and digest algorithms.
Groups already have names.
If we introduce a new field called
groupname
, then the group create and update APIs will have two separate fields in the body namedname
andgroupname
.eLife and the LMS app are abusing usernames
A "username" is supposed to be a user (human)-chosen unique identifier that can be used in display or in URLs to publicly identify that user. eLife uses usernames that look like
fs4ngbko
. The LMS app uses a SHA-1 of thetool_consumer_instance_guid
anduser_id
LTI launch params. These are opaque IDs, not usernames. They can't be used in URLs or displayed in the UI as usernames normally are in Hypothesis. The reason why both eLife and the LMS app have to stuff opaque IDs into the username field is that username is a required field in Hypothesis, and we haven't taken the refactoring time to make it optional. We're getting away with this because third-party users don't have URLs yet (user pages don't work for third-party users yet) and because non-unique display names are always enabled for third-party users.Usernames aren't mutable
Lyza's gist (second revision part) says that usernames are mutable so groupnames -- since they're intended to be similar to usernames -- must be mutable too, and therefore a groupname-based upsert API isn't possible. The reasons why usernames can be changed (by sending a support request to us) don't apply to caller-provided opaque IDs for groups:
Also, I don't really think usernames are mutable:
Changing a username would break eLife or the LMS app
If eLife of LMS users were able to change their usernames by any means this would break the app. For example the LMS app uses a SHA-1 of LTI data to compute the username and re-find the existing user account in h. If the username in h has changed since the LMS app created the account, it would fail to find it. Generally speaking IDs (user or group) that're provided by the integrator being mutable are hostile to the integrator -- if someone changes the ID the integration will break.
Group provider IDs are like user provider IDs
What the LMS app and other similar integrations need is to be able to take the unique ID of an outside object (a Canvas course in the LMS app's case), or generate one (a SHA-1 in the LMS app's case, but we want to be able to support any kind of digest or UUID within reason), and then create or update a Hypothesis group using that ID.
Group.provider_unique_id
field onmodels.Group
Group.provider_unique_id
but there's no reason for other groups to have one.The only difference between
Group.provider_unique_id
andprovider_unique_id
s for users is that users have zero-or-more provider unique IDs whereas groups have zero-or-one. This is why user provider-unique-ids are in a separate table (UserIdentity.provider_unique_id
) whereas group ones just need to be a new column on the group table (Group.provider_unique_id
). There's good reason for users to be able to have more than one ID -- it's to enable multiple logins / login integrations in future. But this isn't a concern for groups. (Also having more than one group provider ID might make designing update or upsert APIs for this much harder.)Questions about REST
The ID doesn't need to be mutable, at least not from the LMS app's point of view currently (or likely ever). If mutability makes the API much harder for the LMS app and other integrators to use then it's actively undesirable from the integrators point of view. Maybe worth asking if we have an upsert API that the LMS app uses and that upsert API doesn't allow mutating the
provider_unique_id
, does that preclude us from also having an up_date_ API that does allow mutating it? The LMS app would have no need for such an API. But if mutability is desired is there anything preventing us from having the mutable restful API as well as the API that LMS app needs?Does having a RESTful API preclude us from also having some other API endpoints that may not be strictly restful, in cases where they'll make the integrations we need to implement simpler and more robust?