Created
April 10, 2018 04:31
-
-
Save goungoun/946caba852187a4f4b9ac05f3f877817 to your computer and use it in GitHub Desktop.
Embulk-GCP Test
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
아래 자료는 황장군님의 강의자료를 GCP에서 테스트한 결과입니다. | |
# ETL | |
- fluntd (streaming) | |
- embulk (batch) http://www.embulk.org/docs/ | |
embulk를 리눅스에 설치해보자. jar를 copy 하면 됨 | |
~~~bash | |
curl --create-dirs -o ~/.embulk/bin/embulk -L "https://dl.embulk.org/embulk-latest.jar" | |
chmod +x ~/.embulk/bin/embulk | |
echo 'export PATH="$HOME/.embulk/bin:$PATH"' >> ~/.bashrc | |
source ~/.bashrc | |
~~~ | |
~~~ | |
gounna@etl3:~$ | |
gounna@etl3:~$ | |
gounna@etl3:~$ curl --create-dirs -o ~/.embulk/bin/embulk -L "https://dl.embulk.org/embulk-latest.jar" | |
% Total % Received % Xferd Average Speed Time Time Time Current | |
Dload Upload Total Spent Left Speed | |
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 | |
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 | |
100 43.4M 100 43.4M 0 0 14.2M 0 0:00:03 0:00:03 --:--:-- 22.9M | |
gounna@etl3:~$ chmod +x ~/.embulk/bin/embulk | |
gounna@etl3:~$ echo 'export PATH="$HOME/.embulk/bin:$PATH"' >> ~/.bashrc | |
gounna@etl3:~$ source ~/.bashrc | |
gounna@etl3:~$ ll | |
total 32 | |
drwxr-xr-x 5 gounna gounna 4096 Apr 10 02:34 ./ | |
drwxr-xr-x 4 root root 4096 Apr 10 02:30 ../ | |
-rw-r--r-- 1 gounna gounna 220 Aug 31 2015 .bash_logout | |
-rw-r--r-- 1 gounna gounna 3809 Apr 10 02:34 .bashrc | |
drwx------ 2 gounna gounna 4096 Apr 10 02:30 .cache/ | |
drwxr-x--- 3 gounna gounna 4096 Apr 10 02:34 .embulk/ | |
-rw-r--r-- 1 gounna gounna 655 May 16 2017 .profile | |
drwx------ 2 gounna gounna 4096 Apr 10 02:34 .ssh/ | |
gounna@etl3:~$ | |
gounna@etl3:~$ | |
gounna@etl3:~$ embulk | |
/home/gounna/.embulk/bin/embulk: line 171: exec: java: not found | |
gounna@etl3:~$ java | |
The program 'java' can be found in the following packages: | |
* default-jre | |
* gcj-5-jre-headless | |
* openjdk-8-jre-headless | |
* gcj-4.8-jre-headless | |
* gcj-4.9-jre-headless | |
* openjdk-9-jre-headless | |
Ask your administrator to install one of them | |
gounna@etl3:~$ | |
~~~ | |
설치 | |
~~~ | |
sudo apt-get upgrade | |
sudo apt-get install default-jre | |
sudo apt-get install default-jdk | |
embulk gem install embulk-output-postgresql | |
~~~ | |
설치 후 확인 | |
~~~ | |
gounna@etl3:~$ embulk | |
Embulk v0.9.5 | |
Usage: embulk [-vm-options] <command> [--options] | |
Commands: | |
mkbundle <directory> # create a new plugin bundle environment. | |
bundle [directory] # update a plugin bundle environment. | |
run <config.yml> # run a bulk load transaction. | |
cleanup <config.yml> # cleanup resume state. | |
preview <config.yml> # dry-run the bulk load without output and show preview. | |
guess <partial-config.yml> -o <output.yml> # guess missing parameters to create a complete configuration file. | |
gem <install | list | help> # install a plugin or show installed plugins. | |
new <category> <name> # generates new plugin template | |
migrate <path> # modify plugin code to use the latest Embulk plugin API | |
example [path] # creates an example config file and csv file to try embulk. | |
selfupdate [version] # upgrades embulk to the latest released version or to the specified version. | |
VM options: | |
-E... Run an external script to configure environment variables in JVM | |
(Operations not just setting envs are not recommended nor guaranteed. | |
Expect side effects by running your external script at your own risk.) | |
-J-O Disable JVM optimizations to speed up startup time (enabled by default if command is 'run') | |
-J+O Enable JVM optimizations to speed up throughput | |
-J... Set JVM options (use -J-help to see available options) | |
-R... Set JRuby options (use -R--help to see available options) | |
Use `<command> --help` to see description of the commands. | |
~~~ | |
~~~bash | |
gounna@etl3:~$ embulk example etl3 | |
2018-04-10 02:44:22.887 +0000: Embulk v0.9.5 | |
Creating etl3 directory... | |
Creating etl3/ | |
Creating etl3/csv/ | |
Creating etl3/csv/sample_01.csv.gz | |
Creating etl3/seed.yml | |
Run following subcommands to try embulk: | |
1. embulk guess etl3/seed.yml -o config.yml | |
2. embulk preview config.yml | |
3. embulk run config.yml | |
gounna@etl3:~$ ls | |
etl3 | |
gounna@etl3:~$ cd etl3 | |
gounna@etl3:~/etl3$ ls -al | |
total 16 | |
drwxrwxr-x 3 gounna gounna 4096 Apr 10 02:44 . | |
drwxr-xr-x 6 gounna gounna 4096 Apr 10 02:44 .. | |
drwxrwxr-x 2 gounna gounna 4096 Apr 10 02:44 csv | |
-rw-rw-r-- 1 gounna gounna 84 Apr 10 02:44 seed.yml | |
~~~ | |
작업지시를 yml에 작성하게 된다. | |
데이터 소스의 접속정보와 무슨 컬럼을 가져와서 타겟쪽에 어떻게 적재할 것인지 | |
embulk가 해주는 일은 csv를 위부터 몇 줄을 읽어서 스스로 이런 형태라고 작업 지시를 반 정도 채워준다. | |
embulk 주요 명령어 | |
~~~ bash | |
embulk guess seed.yml // CSV 데이터 형태 추정 | |
embulk guess seed.yml -o etl3.yml // 추정한 것을 작업지시서로 생성 | |
embulk preview etl3.yml // 실행 전에 미리 몇 줄만 보기 | |
embulk run etl3.yml // 진짜 실행 | |
~~~ | |
~~~bash | |
gounna@etl3:~/etl3$ cat seed.yml | |
in: | |
type: file | |
path_prefix: '/home/gounna/etl3/csv/sample_' | |
out: | |
type: stdout | |
~~~ | |
~~~bash | |
gounna@etl3:~/etl3$ embulk guess seed.yml | |
2018-04-10 02:47:43.740 +0000: Embulk v0.9.5 | |
2018-04-10 02:47:44.420 +0000 [INFO] (main): Started Embulk v0.9.5 | |
2018-04-10 02:47:44.474 +0000 [INFO] (0001:guess): Listing local files at directory '/home/gounna/etl3/csv' filtering filename by prefix 'sample_' | |
2018-04-10 02:47:44.475 +0000 [INFO] (0001:guess): "follow_symlinks" is set false. Note that symbolic links to directories are skipped. | |
2018-04-10 02:47:44.479 +0000 [INFO] (0001:guess): Loading files [/home/gounna/etl3/csv/sample_01.csv.gz] | |
2018-04-10 02:47:44.491 +0000 [INFO] (0001:guess): Try to read 32,768 bytes from input source | |
2018-04-10 02:47:46.915 +0000 [INFO] (0001:guess): Gem's home and path are set by default: "/home/gounna/.embulk/lib/gems" | |
2018-04-10 02:47:47.833 +0000 [INFO] (0001:guess): Loaded plugin embulk (0.9.5) | |
2018-04-10 02:47:47.853 +0000 [INFO] (0001:guess): Loaded plugin embulk (0.9.5) | |
2018-04-10 02:47:47.872 +0000 [INFO] (0001:guess): Loaded plugin embulk (0.9.5) | |
2018-04-10 02:47:47.888 +0000 [INFO] (0001:guess): Loaded plugin embulk (0.9.5) | |
in: | |
type: file | |
path_prefix: /home/gounna/etl3/csv/sample_ | |
decoders: | |
- {type: gzip} | |
parser: | |
charset: UTF-8 | |
newline: LF | |
type: csv | |
delimiter: ',' | |
quote: '"' | |
escape: '"' | |
null_string: 'NULL' | |
trim_if_not_quoted: false | |
skip_header_lines: 1 | |
allow_extra_columns: false | |
allow_optional_columns: false | |
columns: | |
- {name: id, type: long} | |
- {name: account, type: long} | |
- {name: time, type: timestamp, format: '%Y-%m-%d %H:%M:%S'} | |
- {name: purchase, type: timestamp, format: '%Y%m%d'} | |
- {name: comment, type: string} | |
out: {type: stdout} | |
Use -o PATH option to write the guessed config file to a file. | |
~~~ | |
~~~bash | |
gounna@etl3:~/etl3$ embulk preview etl3.yml | |
2018-04-10 02:56:46.703 +0000: Embulk v0.9.5 | |
2018-04-10 02:56:47.378 +0000 [INFO] (main): Started Embulk v0.9.5 | |
2018-04-10 02:56:47.436 +0000 [INFO] (0001:preview): Listing local files at directory '/home/gounna/etl3/csv' filtering filename by prefix 'sample_' | |
2018-04-10 02:56:47.437 +0000 [INFO] (0001:preview): "follow_symlinks" is set false. Note that symbolic links to directories are skipped. | |
2018-04-10 02:56:47.440 +0000 [INFO] (0001:preview): Loading files [/home/gounna/etl3/csv/sample_01.csv.gz] | |
2018-04-10 02:56:47.451 +0000 [INFO] (0001:preview): Try to read 32,768 bytes from input source | |
+---------+--------------+-------------------------+-------------------------+----------------------------+ | |
| id:long | account:long | time:timestamp | purchase:timestamp | comment:string | | |
+---------+--------------+-------------------------+-------------------------+----------------------------+ | |
| 1 | 32,864 | 2015-01-27 19:23:49 UTC | 2015-01-27 00:00:00 UTC | embulk | | |
| 2 | 14,824 | 2015-01-27 19:01:23 UTC | 2015-01-27 00:00:00 UTC | embulk jruby | | |
| 3 | 27,559 | 2015-01-28 02:20:02 UTC | 2015-01-28 00:00:00 UTC | Embulk "csv" parser plugin | | |
| 4 | 11,270 | 2015-01-29 11:54:36 UTC | 2015-01-29 00:00:00 UTC | | | |
+---------+--------------+-------------------------+-------------------------+----------------------------+ | |
gounna@etl3:~/etl3$ | |
~~~ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment