Accessing records is non-trivial. PII is stored across various services an organisation operates. These might have been built internally, or by external agencies. They may have a high bus factor, and it can be hard for someone new to figure out what data it stores, let alone query it. Auditing all your data and developing workflows for querying it can involve many people within multiple companies and potentially dozens of man-hours for a single Subject Access Request.
Deleting records can be difficult because of foreign key constraints. Blanking out data can be a way to work this in existing services, but new services will need to be designed with Subject Removal Requests in mind.
Server webapp
^ ^ ^ ^
| | | |
| | | -- Client library <- mssql database (remote)
| | | Q
A | | ----- Client library U <- redis store
P | | E
I | -------- Client framework R <- mysql database (localhost)
| integration Y <- avatars/{user-guid}.jpg
|
----------- Standalone client <- nosql database (managed cloud service)
A Server webapp that's easy to deploy on a new host (via package managers, shell scripts, containers, or all of the above). Allows data protection people to search PII (email, name, GUID etc.) to find records across various other data stores.
Client libraries for as many languages as possible, that can be integrated into existing projects and configured with information about which database tables, redis keys, filesystem paths etc. contain which PII data. Will respond to access requests from the Server webapp with access via an agreed, secured API. For handling data removal it might be able to provide utilities for blanking out data rather than deleting it.
Client integrations for popular WAFs. For example someone could extend piihub-client-php to create piihub-client-laravel to make it even easier for developers to add support.
For adding Subject Acccess Request support to legacy services where the domain knowledge has been lost (i.e. to staff turnover or buses), there could be a Standalone Client implementations for as many environments as possible, that can access (read-only) the same database as an existing web service.
These Client implementations will need to know how to talk to the Server, so an API will need to be designed.
The proposed Server webapp can query many services across disparate systems. Care should be taken to employ a fool-proof authentication implementation, as this entire approach represents a single point of failure. Or another approach altogether should be considered that doesn't result in this worrisome single point of failure.
If someone has a login for the Server webapp, we may still not want them to be able to perform certain queries or actions. Abuse is a concern since the barrier to retrieving PII is lowered. The purpose of this proposal is to improve internet user's privacy, and by centralising access to data in this manner, we may inadvertently make things worse. Proper authorisation (role based? ACL?) will be needed to ensure all Subject Access Requests and Subject Removal Requests are carried out lawfully and morally.
- PII: Personally identifiable information
- WAF: Web application framework
- GUID: Globally unique identifier
- Bus factor: How many people it would take getting run over by a bus before your organisation has no way of figuring out how a service works or what data is stored where. Higher is better.