A DataFrame is a distributed collection of data, which is organized into named columns. Conceptually, it is equivalent to relational tables with good optimization techniques.
| id | name | age |
|---|---|---|
| 1201 | phil | 25 |
| 1202 | barbara | 28 |
| 1203 | jon | 39 |
| 1204 | dirk | 23 |
| #!/usr/bin/env python3 | |
| ''' | |
| Script to detect models downstream of changed macros, relative to a git branch. | |
| Usage: | |
| $ python3 models_using_changed_macros.py --branch master --children --manifest_path /path/to/manifest.json | |
| ''' | |
| import os |
| {% macro get_period_boundaries(target_schema, target_table, timestamp_field, start_date, stop_date, period) -%} | |
| {% call statement('period_boundaries', fetch_result=True) -%} | |
| with data as ( | |
| select | |
| coalesce(max({{timestamp_field}}), {{start_date}})::timestamp as start_timestamp, | |
| coalesce( | |
| {{dbt_utils.dateadd('millisecond', | |
| -1, | |
| "nullif('" ~ stop_date ~ "','')::timestamp") |
| #!/usr/bin/env python3 | |
| """ | |
| CI script to check: | |
| 1. Models have both a unique and not_null test. | |
| 2. Models have a description and columns (i.e. a schema.yml entry) | |
| """ | |
| import json | |
| import logging | |
| import os | |
| import subprocess |
| #!/usr/bin/env python3 | |
| '''Script to autogenerate dbt commands for changed models against a chosen git branch, | |
| with support for fully refreshing models with specific tags. | |
| Usage: | |
| $ python3 dbt_run_changed.py --target_branch master --target dev --commands [run, test] --full_refresh_tags [full_refresh] | |
| Assume model1 and model2 are changed models and model2 is tagged with "full_refresh". The script will generate three dbt commands: | |
| 1. dbt run --target dev --model model2 --full-refresh |