theme |
---|
/home/syn/.glamour/dracula.json |
Stream map support for meltano run landed via a new Singer compatible Mapper plugin type!
Like meltano run
itself it needs testing/feedback.
Mappers allow you to transform or manipulate data after extraction and before loading:
- Streams/properties can be aliased to provide custom naming downstream.
- Stream records can be filtered based on any user-defined logic.
- Properties can be transformed inline (i.e. converting types, sanitizing PII data).
- Properties can be removed from the stream.
- New properties can be added to the stream.
Note that mappers are currently only available when using meltano run
.
mappers:
- name: transform-field
variant: transferwise
pip_url: pipelinewise-transform-field
executable: transform-field
mappings:
- name: hide-gitlab-secrets
config:
transformations:
- field_id: author_email
tap_stream_name: commits
type: MASK-HIDDEN
- field_id: committer_email
tap_stream_name: commits
type: MASK-HIDDEN
- name: who-needs-ids
config:
transformations:
- field_id: id
tap_stream_name: commits
type: SET-NULL
We've gained a new mapper plugin type and associated config.
mappers:
- name: transform-field
variant: transferwise
pip_url: pipelinewise-transform-field
executable: transform-field
mappings: <--------------------------- top level mappings key
- name: hide-gitlab-secrets
config:
transformations:
- field_id: author_email
tap_stream_name: commits
type: MASK-HIDDEN
- field_id: committer_email
tap_stream_name: commits
type: MASK-HIDDEN
- name: who-needs-ids
config:
transformations:
- field_id: id
tap_stream_name: commits
type: SET-NULL
A bit different than usual. You don't define a single top level config for the
mapper. You instead define mappings
!
mappers:
- name: transform-field
variant: transferwise
pip_url: pipelinewise-transform-field
executable: transform-field
mappings:
- name: hide-gitlab-secrets <--------- mapping config with two actions
config:
transformations:
- field_id: author_email
tap_stream_name: commits
type: MASK-HIDDEN
- field_id: committer_email
tap_stream_name: commits
type: MASK-HIDDEN
- name: who-needs-ids <--------------- mapping config with another
config:
transformations:
- field_id: id
tap_stream_name: commits
type: SET-NULL
You can define multiple mappings per mapper
that you can then invoke by name.
mappers:
- name: transform-field
variant: transferwise
pip_url: pipelinewise-transform-field
executable: transform-field
mappings:
- name: hide-gitlab-secrets
config: <----------------- config gets passed to the plugin at invocation
transformations:
- field_id: author_email
tap_stream_name: commits
type: MASK-HIDDEN
- field_id: committer_email
tap_stream_name: commits
type: MASK-HIDDEN
- name: who-needs-ids
config: <---------------- config gets passed to the plugin at invocation
transformations:
- field_id: id
tap_stream_name: commits
type: SET-NULL
The config
defined for each mapping is what actually gets passed to the mapper
plugin at invocation time. What the config holds will differ between plugins...
mappers:
- name: transform-field
variant: transferwise
pip_url: pipelinewise-transform-field
executable: transform-field
mappings:
- name: who-needs-ids
config: <---------------- config will vary plugin
transformations:
- field_id: id
tap_stream_name: commits
type: SET-NULL
- name: awesome-custom-transform
pip_url: very-awesome-dataco-transforms
mappings:
- name: fix-ids-in-commits
config: <---------------- config will vary plugin
transformations:
- key: id
set: 42
- name: meltano-map-transformer
variant: meltano
pip_url: git+https://github.com/MeltanoLabs/meltano-map-transform.git
executable: meltano-map-transform
mappings:
- name: backup-commits
config: <---------------- config will vary plugin
stream_maps:
commits:
__alias__: "commits_orig"
Same...but....different.
Invoke one:
$ meltano run tap-gitlab who-needs-ids target-jsonl
And the mapping name will get resolved to the plugin:
~~~graph-easy --as=boxart
[ who-needs-ids ] - to -> [ transform-field ]
~~~
Invoke one:
$ meltano run tap-gitlab who-needs-ids target-jsonl
And the mapping name will get resolved to the plugin:
~~~graph-easy --as=boxart
[ who-needs-ids ] - to -> [ transform-field ]
~~~
~~~graph-easy --as=boxart
[ tap ] - to -> [ mask author_email \nmask committer_email ] - to -> [ target ]
~~~
Invoke n+1:
$ meltano run tap-gitlab hide-secrets custom-thing fix-id target-jsonl
~~~graph-easy --as=boxart
[ tap ] - to -> [ tansform-field ] - to -> [ custom ] - to -> [transform-field] - to -> [ target ]
~~~
...Time to flip tabs and see it in action...
_________________________
< less slides more demos! >
-------------------------
O
O
o
\||/
| @___oo
/\ /\ / (__,,,,|
) /^\) ^\/ _)
) /^\/ _)
) _ / / _)
/\ )/\/ || | )_)
< > |(,,) )__)
|| / \)___)\
| \____( )___) )___
\______(_______;;; __;;;
- meltano run only
- invoked by mapping name instead of plugin name
- arbitrary number of mappers can run between tap/target
- config is manual atm but theres a issue on the backlog already.
- failure in a mapper will exit the job
Thank you for coming to my ted talk demo!
ps: We're hiring - come fix my terrible python - https://meltano.com/jobs