Last active
March 29, 2025 08:37
-
-
Save eponkratova/c7657d9ef082952760283af4257f18da to your computer and use it in GitHub Desktop.
technical_data_audit
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Domain | Sub-domain | Questions | |
|---|---|---|---|
| Operational excellence | Infrastructure as code | How do you automate the deployment of databases, data pipelines, and ETL processes? | |
| Monitoring and observability | What practices do you use for logging, monitoring, and alerting for data operations | ||
| How do you monitor data quality metrics (freshness, completeness, anomalies, etc.) | |||
| Incident & change management | How do you track and manage data schema change, pipeline modifications | ||
| How do you document data pipeline incidents and troubleshooting | |||
| Automation & orchestration | How do you automate ingestion, cleansing, and transformation processes | ||
| Do you use any orchestration tools? | |||
| Security | Encryption & data protection | Do you use data encryption at rest (storage) and in transit (ETL, ingestion) | |
| What methods do you use to secure sensitive data fields? | |||
| Identify & access management | Do you enforce the principle of least privileges on data access? | ||
| Do you regularly audit access to sensitive data? | |||
| Compliance & auditability | Do you have audit logs for data access and modifications? | ||
| Do you review compliance with relevant data privacy laws? | |||
| Data loss prevention & recovery | Do you have backup and recovery strategies? | ||
| Do you regularly test and validate your recovery processes? | |||
| Reliability | Fault tolerance & redundancy | Do you have replication and redundancy of critical data assets? | |
| How do you ensure the availability of databases? | |||
| Disaster recovery | Do you regularly back up critical databases, data warehouses, and pipelines? | ||
| Do test failover scenarios for critical data infrastructure? | |||
| Data integrity | Do you have data validation and integrity checks at the ingestion and transformation stages? | ||
| Stability & resilience | Is your data infrastructure capable of scaling | ||
| Performance efficiency | Data retrieval | Have you implemented database indexing, partitioning, and materialized views? | |
| Do you profile and optimize queries? | |||
| ETL/ELT efficiency | What data format and data compression techniques do you use? | ||
| Resource efficiency | How do you monitor resource consumption and optimize storage solutions? | ||
| Data architecture | Do you periodically review and improve data models? | ||
| Cost optimization | Storage cost optimization | Do you review and optimize data storage ties? | |
| Do you have data lifecycle management (archive and retention policies)? | |||
| Compute resource optimization | What ETL solution do you use (serverless vs. managed)? | ||
| Licensing & operational cost | Do you regularly review and optimize software and cloud service usage? | ||
| Monitoring | Do you have a budget and alerts for cloud sending? | ||
| Sustainability | Data lifecycle & retention | Do you have automate lifecycle management of data, deleting obsolete resources? | |
| Do you audit data stores for unused or redundant data? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment