this is some example code I used to extract metadata (REPORT/DATA_SOURCE) from Data Studio.
Warning : Information provided via Data Studio API very limited. So if you are looking for ways to understand more details such as what are the constructs of a REPORT or what backend a DATA_SOURCE is connecting to, unfortunately these will not help. That said, it can still be better than nothing so that you know what assets are existing within the organization.
Extracted information can be potentially cataloged in some other services such as Data Catalog. However, that is beyond what this sample covers.
Data Studio
- Two types of assets accessible via API. REPORT and DATA_SOURCE
- Data Studio API :
Service Accounts with G Suite domain wide delegation
not within scope of examples here.
G Suite Audit logs
- Data Studio audit logs are also captured
- https://support.google.com/datastudio/answer/9690662
Data Catalog
- metadata can be stored as custom resource
- https://cloud.google.com/data-catalog/docs/how-to/custom-entries
.
├── assets2json.py
├── dsclient
│ ├── __init__.py
│ ├── auth.py
│ └── datastudio.py
└── key.json // your service account key file
something like this
python asset2json.py > data.json
output is an array of objects that maps to metadata of either a REPORT or DATA_SOURCE
[
{
"name": "fcadaf00-6a5e-4c99-b56d-XXXXXXXXXXXX",
"title": "gcp_billing_export_v1_XXXXXX_XXXXXX_XXXXXX",
"assetType": "DATA_SOURCE",
"updateTime": "2020-11-09T14:24:38Z",
"updateByMeTime": "2020-11-09T14:24:38Z",
"createTime": "2020-11-06T16:37:35Z",
"lastViewByMeTime": "2020-11-09T14:24:38Z",
"owner": "[email protected]",
"permissions": {
"OWNER": {
"members": [
"user:[email protected]"
]
},
"VIEWER": {
"members": [
"user:[email protected]"
]
}
}
},
{
"name": "7578a73a-4c39-4947-88b5-XXXXXXXXXXXX",
"title": "Copy of [Sample] Google Analytics Marketing Website",
"assetType": "REPORT",
"updateTime": "2020-11-06T10:03:27Z",
"updateByMeTime": "2020-11-06T09:36:55Z",
"createTime": "2020-11-06T08:46:22Z",
"lastViewByMeTime": "2020-11-06T10:03:39Z",
"owner": "[email protected]",
"permissions": {
"OWNER": {
"members": [
"user:[email protected]"
]
},
"VIEWER": {
"members": [
"user:[email protected]"
]
}
}
}
......
]
Data Studio API
- Can list assets, but can be very inefficient. Number of requests is
(number of users) * (number of reports each) *2
(an additional call to fetch permissions on the report) at least. Thus, This code will be pretty slow in large orgs as it has to fetch the list of reports per user. Better to be implemented as a parallel fetch. - User list should be fetched from G Suite directory, or expanded from Groups