LSP Authentication

Introduction

This document covers core technologies and interactions between services, APIs, and applications interacting with the LSST Science Platform. The authentication use cases covered in this cover interactive users who are primarily interacting with the LSP in the following ways:

Logged into the notebook aspect
Logged into the portal aspect
From a local terminal, exercising the API aspect from third party libraries or applications

For the users in question, we make a few assumptions about use of the LSP in this context:

Users are active, continuously or intermittently, over the course of an extended work day. Interactions will typically last several hours, though the system should be prepared for interactions lasting up to 24 hours
At the start of an interaction with the LSP system, users do not know which types of data they will touch.
Users include LSST staff, and scientists that are members of science collaborations.
Users already have an NCSA account through some mechanism.

Due to these assumptions, this document will focus on identity-based authentication and authorization, though we will cover aspects of role-based and capabilities-based authorization for the LSP.

Identity and Access Management

For clarity, we further define users and groups within this document, as well as minimal requirements from the IAM system as needed by the LSP, and likely the broader Data Management System. It's expected further definition of the IAM system will be available in the future in a change controlled document.

User

A user is minimally identified by either a UNIX ID number (UID) and a user name. A service MUST be able to lookup a UID if it has a user name, or a user name if it has a UID.

Real Accounts

A user account identifying a specific person is a real account.

Shared Accounts

A user account may actually be a shared account, which does not identify a specific person. A shared account must have at least one real account managing the shared account.

A shared account is nominally for the use and organization of shared resources, such as disk storage and limited compute, if available.

Shared Account Use Cases

A shared account might be created a number of purposes. This might include:

Projects; such as Stack Club
Teams; such as Alerts, DAX, SQuaRE
Science Collaborations
Data Releases

Groups

A user is a member of one or more groups. All groups defined for a given user are owned exclusively by that user. The IAM system SHOULD disallow group names that are not representable as UNIX group names within the Data Management System, including file systems and databases. Users managing a user that is a shared account are implicitly allowed to manage the groups of that user account, including the creation of them, and membership of them.

A core set of groups do not belong to a specific user. These are defined and managed by the LSST system administrators.

Group membership MUST be discoverable through at least an LDAP service provided by the IAM system. Additional services for querying group membership MAY be implemented.

Clients querying a group membership service SHOULD cache results. Results SHOULD be cached with a TTL for no less than 30 seconds and no longer than 1 hour. A 5 minute TTL is recommended.

Users and services should be made aware of the caching TTL as well as potential latencies due to user and groups synchronization. It may take up to 2 hours for groups to be synchronized.

Some systems may not allow assignment of permissions to users, only groups. If this is the case, then users MUST be part of a group whose only membership is the user. This would enable another user to extend access to a resource by assigning a read permission to the user's primary group, for example.

User and Groups Synchronization

When necessary, the IAM system SHOULD create users and groups in underlying systems when necessary, and sychronize membership accordingly. The sychronization SHOULD finish in under an hour, and MUST finish within 24 hours.

In databases, groups should be represented as roles.

The assignment of privleges on resources according to users and groups is out of scope for this document.

Roles

There's currently no concept of roles in the existing IAM system for NCSA. A system that represents roles must also have permissions associated with roles. As such, Roles and are generally out of scope for this document, but they are mentioned for informational purposes.

It's possible that roles may be implemented through some combination of shared accounts and group membership. For example, a firefly shared account may have the groups firefly_usdac_user, firefly_pdac_user, and firefly_admin defined. In this example, these groups are effectively roles.

Capabilities

We expect some form of capabilities-based authorization will be useful for the Data Management System in the future. This section will outline what that is and the requirements for such a system to be implemented.

Capabilties-based security system is based on the object-capability security model.

A capabilities-based system, in the context of LSST DM system, would rely on:

A definition of resources across the LSST DM system to which you can assign access rights to; such as dataset collections (butler repos), database tables, services.
A reference to a resource or set of resources; such as a token, which the system can validate and enforce access control
A definition of operations to be performed on the resource; such as read, write, delete, execute, for example.

Together, the reference and operation can be included in a message and will represent a capability. In order for the system to be secure, the message MUST be unforgeable. This would typically implemented through a cryptographic signature.

For the issuance of the capabilities, the following are required:

A method of determining the set of those capabilities for a given user or use case; and
A system which either implements that method, which issues the unforgeable message (a token or certificate); or
A system that is notified notified by another system implementing the method;

Low-level systems, including disk storage (NFS, GPFS, S3/Swift/Ceph) and databases (Oracle, MySQL), do not have a way of enforcing capabilities-based authorizations. As such, to implement a capabilities-based security system, it's required to have a service in front of those systems which can process the messages.

To process a request with a capabilities message, a service MUST:

Agree to the definiton of resources issued in the message, mapping them to the system the system (or underlying system) manages
Agree to the definition of operations in the message; mapping them to the operations the system (or underlying system) implements
Examine the request and verify ALL resource and operation pairs a request may need are represented in the message.

Authentication

Authentication by a real user is handled by the IAM system. All authentication for LSP services are handled through the OAuth2 Protocol by the IAM system.

Authentication for a shared account is out of scope for this document. It is expected that users may be members of groups that are owned by shared accounts, but they will always authenticate as themselves.

Authentication using means such as kerberos is out of scope of this document.

Service Access Authorization

LSP services MAY limit access by users through group membership. In these cases, a service needs to acquire a list of groups associated with a user, either as claims in a token, or through a membership query to a service.

Data Access Authorization

Low-Level systems SHOULD be relied upon to authorize access to data. This includes:

Disk Storage, such as NFS, GPFS;
Databases, such as Oracle or Qserv

brianv0/auth.md