grahamegrieve · September 22, 2017 04:44 · chrisgrenz · Sep 14, 2017 · chrisgrenz · Sep 14, 2017
diff --git a/BulkDataAccessPlan b/BulkDataAccessPlan
 Use cases

 This document describes a way of granting an application access to data on a set of patients. 
 The application can request a copy of all pertinent (clinical) access to the patients in a 
 single download. Note: We expect that this data will be pretty large. 

 Authorizing Access

 Access to the data is granted by using the SMART backend services spec 
 (url: http://docs.smarthealthit.org/authorization/backend-services/). 

 We didn’t see a need for Group/* or Launch/* kind of scopes - System/*.read will do
 fine. (or User/*.*, for interactice processes, though interactive processes are out of 
 scope for this work)

 Accessing Data

 The application can do either of the following queries:

 GET [base]/Patient/$everything?start=[date-time]&_type=[type,type] 
 GET [base]/Group/[id]/$everything?start=[date-time]&_type=[type,type] 

 Notes:
 * The first query returns all data on all patients that the user account has access to, since the starting date time provided. 
 * The second query provides access to all data on all patients in the nominated group. How the Group resource is 
  created/identified/defined/managed is out of scope for now 
  (question of whether we need to do sort this out has been referred to ONC). 
 * the start date/time means only records since the nominated time. In the absence of the parameter, it means all data ever
 * The _type parameter is used to specify which resource types are part of the focal query (no impact on which related 
  resources are included). In the absence of this parameter, all types are included. This includes at least the CCDS
 * The FHIR specification will be modified to allow Patient/$everything to cross patients, and to add $everything to Group
 * Group will be added as a compartment type in the base Specification
  
 Generally, this is expected to result in quite a lot of data. The client is expected to request this asynchronously, per rfc 7240. 
 To do this, the client uses the Prefer header:

  Prefer: respond-async

 When the server sees this return header, instead of generating the response, and then returning it, the server returns a 
 202 Accepted header, and a Content-Location at which the client can use to access the response. 

 The client then queries this content location using GET content-location (no prefixing). The response can be one of 3 outcomes:

 * a 202 Accepted that indicates that processing is still happening. This response has no body. 
  It may also have an X-Progress header that provides some indication of progress to the user 
 * a 5xx Error that indicates that preparing the response has failed. The body is an OperationOutcome describing the error
 * a 200 OK with the response for the original request. This response can carry a X-Available-Until header to indicate when
  the response will no longer be available, and one ore more Link: headers that list the files that are available for download 
  after preparation is complete

 Notes:
 * This asynchronous protocol will be added as a general feature to the FHIR spec for all calls. Server discretion when to support it.
 * Client can cancel a task or advise the server it's ok to delete the outcome using DELETE content-location.
 * Other than the 5xx response, these responses have no body, except when the accept content type is 'text/html', in 
  which case the responses have an HTML representation of the content in the header (e.g. a redirect, an error, or 
  a list of files to download) (server discretion whether to support text/html)
 * Link Headers can have one or more links in them, per rfc 5988
 * todo: decide whether to add 'supports asynchronous' flag to the CapabilityStatement resource

 Format of returned data

 If the client uses the Accept type if application/fhir+json or application/fhir+xml, the response will be a bundle in the 
 specified format. Alternatively, the client can use the type application/fhir+ndjson. In this case the response is a 
 set of files in ndjson format (see http://ndjson.org/). Each file contains only resources of a single type.
 There can be more than one file for each resource type. Bundles are broken up at Bundle.entry.resource - e.g. bundle entries
 have a full URL, and the reosuce for the entry will be found in relevant download. (todo: how does that work for history?)

 Notes:
 * the response - whether a Bundle or the ? manifest will include a server time that can be used as the start time on a following query.
 * clients should be prepared to receive resources that change on the boundary more than once (still todo)
 * application/fhir+ndjson will be documented in the base spec
 * may need to do some registration work for +ndjson
 * May need to describe further formats (avro/parquet etc) later - consultation to follow

 Subscriptions

 Subscriptions are not supported at this time - applications can perform this query as needed
	Use cases

	This document describes a way of granting an application access to data on a set of patients.
	The application can request a copy of all pertinent (clinical) access to the patients in a
	single download. Note: We expect that this data will be pretty large.

	Authorizing Access

	Access to the data is granted by using the SMART backend services spec
	(url: http://docs.smarthealthit.org/authorization/backend-services/).

	We didn’t see a need for Group/* or Launch/* kind of scopes - System/*.read will do
	fine. (or User/., for interactice processes, though interactive processes are out of
	scope for this work)

	Accessing Data

	The application can do either of the following queries:

	GET [base]/Patient/$everything?start=[date-time]&_type=[type,type]
	GET [base]/Group/[id]/$everything?start=[date-time]&_type=[type,type]

	Notes:
	* The first query returns all data on all patients that the user account has access to, since the starting date time provided.
	* The second query provides access to all data on all patients in the nominated group. How the Group resource is
	created/identified/defined/managed is out of scope for now
	(question of whether we need to do sort this out has been referred to ONC).
	* the start date/time means only records since the nominated time. In the absence of the parameter, it means all data ever
	* The _type parameter is used to specify which resource types are part of the focal query (no impact on which related
	resources are included). In the absence of this parameter, all types are included. This includes at least the CCDS
	* The FHIR specification will be modified to allow Patient/$everything to cross patients, and to add $everything to Group
	* Group will be added as a compartment type in the base Specification

	Generally, this is expected to result in quite a lot of data. The client is expected to request this asynchronously, per rfc 7240.
	To do this, the client uses the Prefer header:

	Prefer: respond-async

	When the server sees this return header, instead of generating the response, and then returning it, the server returns a
	202 Accepted header, and a Content-Location at which the client can use to access the response.

	The client then queries this content location using GET content-location (no prefixing). The response can be one of 3 outcomes:

	* a 202 Accepted that indicates that processing is still happening. This response has no body.
	It may also have an X-Progress header that provides some indication of progress to the user
	* a 5xx Error that indicates that preparing the response has failed. The body is an OperationOutcome describing the error
	* a 200 OK with the response for the original request. This response can carry a X-Available-Until header to indicate when
	the response will no longer be available, and one ore more Link: headers that list the files that are available for download
	after preparation is complete

	Notes:
	* This asynchronous protocol will be added as a general feature to the FHIR spec for all calls. Server discretion when to support it.
	* Client can cancel a task or advise the server it's ok to delete the outcome using DELETE content-location.
	* Other than the 5xx response, these responses have no body, except when the accept content type is 'text/html', in
	which case the responses have an HTML representation of the content in the header (e.g. a redirect, an error, or
	a list of files to download) (server discretion whether to support text/html)
	* Link Headers can have one or more links in them, per rfc 5988
	* todo: decide whether to add 'supports asynchronous' flag to the CapabilityStatement resource

	Format of returned data

	If the client uses the Accept type if application/fhir+json or application/fhir+xml, the response will be a bundle in the
	specified format. Alternatively, the client can use the type application/fhir+ndjson. In this case the response is a
	set of files in ndjson format (see http://ndjson.org/). Each file contains only resources of a single type.
	There can be more than one file for each resource type. Bundles are broken up at Bundle.entry.resource - e.g. bundle entries
	have a full URL, and the reosuce for the entry will be found in relevant download. (todo: how does that work for history?)

	Notes:
	* the response - whether a Bundle or the ? manifest will include a server time that can be used as the start time on a following query.
	* clients should be prepared to receive resources that change on the boundary more than once (still todo)
	* application/fhir+ndjson will be documented in the base spec
	* may need to do some registration work for +ndjson
	* May need to describe further formats (avro/parquet etc) later - consultation to follow

	Subscriptions

	Subscriptions are not supported at this time - applications can perform this query as needed