Skip to content

Instantly share code, notes, and snippets.

View eponkratova's full-sized avatar

Eka Ponkratova eponkratova

View GitHub Profile
@eponkratova
eponkratova / technical_data_audit.csv
Last active March 29, 2025 08:37
technical_data_audit
Domain Sub-domain Questions
Operational excellence Infrastructure as code How do you automate the deployment of databases, data pipelines, and ETL processes?
Monitoring and observability What practices do you use for logging, monitoring, and alerting for data operations
How do you monitor data quality metrics (freshness, completeness, anomalies, etc.)
Incident & change management How do you track and manage data schema change, pipeline modifications
How do you document data pipeline incidents and troubleshooting
Automation & orchestration How do you automate ingestion, cleansing, and transformation processes
Do you use any orchestration tools?
Security Encryption & data protection Do you use data encryption at rest (storage) and in transit (ETL, ingestion)
What methods do you use to secure sensitive data fields?
@eponkratova
eponkratova / medium.txt
Last active January 9, 2024 23:42
python_history_maintenance_dwh
'
@eponkratova
eponkratova / dbt.csv
Last active February 13, 2023 05:39
dbt integrations
We can make this file beautiful and searchable if this error is corrected: It looks like row 7 should actually have 5 columns, instead of 4 in line 6.
tool,tool category,link,integration,dbt Cloud/dbt Core (use with caution)
Sisu,AI/ML,https://sisudata.com/,...you can define your metrics in dbt and then use them in Sisu for one-click analyses.,dbt Cloud
Continual,AI/ML,https://continual.ai/,"Continual integrates with dbt by allowing dbt users to define entities, feature sets, and predictive models directly from their existing dbt models.",dbt Cloud
Holistics ,BI,https://www.holistics.io/,"Holistics fully integrates with your dbt project, allows you to perform data modeling and transformation at dbt layer, and push those definitions to Holistics BI layer",dbt Cloud/dbt Core
mode,BI,https://mode.com/,Mode customers can now get better views on data freshness with our dbt integration.,dbt Cloud
thoughtspot,BI,https://www.thoughtspot.com/,"ThoughtSpot’s dbt integration allows you to easily provide your existing dbt models and automatically create ThoughtSpot Worksheets, which you can use to search your data.",dbt Cloud
Transform,BI,https://transform.co/,
@eponkratova
eponkratova / README.md
Created November 25, 2022 06:39 — forked from jeremyyeo/README.md
Overriding dbt Cloud default database / schema on CI runs #dbt

Overriding dbt Cloud default database / schema on CI runs

-!  🚨                                          WARNING                                          🚨  !-
You probably do not want to do this because dbt Cloud will not be able to drop the relevant schema 
upon PR merge / close so you will end up with clutter if you are not on top of this.

The following is the default behaviour of [dbt Cloud CI runs][1] when:

AWS Glue Studio AWS Glue DataBrew
Source -S3 -AWS Glue Data catalog (S3, RDS, Redshift, etc.) -Streaming (AWS Kinesis Data Streams, Kafka) -Manual upload -Direct connection using JDBC -AWS Glue Data catalog (S3, Redshift, RDS) -Amazon Appflow -AWS Data Exchange -Snowflake
Algorithm No information but as per https://www.acf.hhs.gov/sites/default/files/documents/opre/opre-understanding_effect_opioid_epidemic_child_maltreatment-jan2022.pdf, k-mean clustering is used No information