Skip to content

Instantly share code, notes, and snippets.

@kyle0r
Last active March 18, 2021 12:15
Show Gist options
  • Save kyle0r/10501a2078a0b1cd198891e971db5dd9 to your computer and use it in GitHub Desktop.
Save kyle0r/10501a2078a0b1cd198891e971db5dd9 to your computer and use it in GitHub Desktop.
export oracle tables to portable raw bytes via named pipe, delimiter separated format (tsv|csv|tabnull|etc), compressed and protected/encrypted by gpg

export oracle tables to portable raw bytes via named pipe, delimiter separated format (tsv|csv|tabnull|etc), compressed and protected/encrypted by gpg

HOWTO / TL;DR

  1. protect yourself and your system, do not run as root.

  2. curl -LO https://gist.github.com/kyle0r/10501a2078a0b1cd198891e971db5dd9/raw/run-export.sh

  3. run run-export.sh overriding the env vars as needed e.g.:

    EDITOR=vim GPG_RECIPIENT_KEY=FFFFFF WORKING_DIR_PATH=/var/tmp/your-preferred-path /bin/bash run-export.sh

  4. in theory the script should do the rest for you, or show you what is missing/wrong.

    • follow the guidance in the script. There are multiple opportunities to abort the process and/or correct issues.
  5. the script expects things to go wrong and is designed to be run multiple times until success is reached.

  6. you can debug the script with /bin/bash -x

  7. bear in mind that the script spawns’ processes waiting to read on a named pipe, if nothing is written they will stall and wait in the background. In between invocations of run-export.sh Keep an eye on ps xuf for detached/zombie processes, and kill and clean them up.

TODO

  1. support specifying a specific oracle db hostname and port
  2. support selecting a specific database
  3. checksums and ways to verify the table contents are correctly exported. i.e. data quality assertions.
    1. e.g. tee the read from the fifo, one output to the existing flow, and another for checksums/assertions.

looks like its related to what comes after the @ in the sqlplus arguments. Here is a good DBA post with infos.

design

create-export-proc.sql->dump_dsv_fast is a standalone generic oracle proc for exporting tables to raw delimiter separated files.
run-export.sh is a helper/runner script that utilises dump_dsv_fast, pigz|gzip, sqlplus, gpg.
run-export.sh auto-fetches the latest revision of create-export-proc.sql if a copy doesn't exist in WORKING_DIR_PATH

why does this gist exist?

oracle exp and expdp (data pump) tools export data in a propriety/binary format which is basically useless and non-interoperable with non-oracle systems, and often non-interoperable between varying oracle versions.

I had some valuable multi-billion record datasets in oracle systems that were being decommissioned, so I naturally wanted a backup for later use.

I wanted to export tables out of oracle in a platform independent (portable) way. i.e. raw bytes. I wanted to bypass issues and limitations with propriety/binary formats. The results could be compressed and encrypted with standard tools and used later in any system/platform. e.g. ingestion to druid.

what did it cost?

directly, a Saturday afternoon+evening on 2021-01-23.

what should it do?

export oracle tables to portable raw bytes via named pipe, delimiter separated format (tsv|csv|tabnull|etc), compressed and protected/encrypted by gpg.

  1. this was created for a PCI Level 1 env processing card holder data (CHD).
  2. all written bytes should be encrypted (with a PCI DEK) and compressed with standard tools.
  3. there should be no temp/intermediate/transient files, written bytes should always be protected.
  4. an oracle proc (create-export-proc.sql->dump_dsv_fast) is created for exporting tables to raw bytes.
  5. a named pipe is created.
  6. for each table object listed in dbObjectList.txt
    1. a compression proc is started to read from the pipe, and waits for writes, it pipes out to the encryption proc.
    2. a encryption proc is started to read from the compression pipe.
    3. the encryption proc outputs to a table specific archive file.
    4. export raw bytes from oracle to the named pipe oracle-exp.fifo

citations and inspirations

inspired by James Abley How to convert an Oracle .dmp into a more portable format (related gist: https://gist.github.com/jabley/3713603)
and jareeq oracle utl_file encoding from utf8

useful info tom post: https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:235814350980
useful info on oracle UTF8 https://community.oracle.com/tech/apps-infra/discussion/3514820/difference-between-al32utf8-and-utf8
useful info on sqlplus shell scripting http://www.dbatoolz.com/t/sqlplus-s-shell-scripting.html

prerequisites

  1. read the scripts carefully and understand what is going on.
  2. sqlplus assumes that oracle is running locally on default ports.
  3. note that if oracle starts writing to the fifo before a proc is reading the fifo, then oracle could present blocking IO issues.
  4. ensure you have enough storage space to dump the data in the WORKING_DIR_PATH.
  5. ensure that the terminal locale/encoding matches that of the oracle raw output to avoid encoding issues.
  6. ensure that the output encoding for the proc matches that of the shell to avoid encoding issues. Controlled by p_characterset
  7. ensure sqlplus is available in the shells $PATH.
function dump_dsv_fast(p_query in varchar2
,p_filename in varchar2
,p_dir in varchar2 default c_DEFAULT_DIRECTORY
,p_separator in varchar2 default chr(9) || chr(0)
,p_text_qualifier in varchar2 default ''
,p_header in boolean default true
,p_eol_format in varchar2 default chr(10)
,p_characterset in varchar2 default 'AL32UTF8'
,p_write_mode in varchar2 default 'WB') return number
/****************************************************
forked from: https://stackoverflow.com/a/51170728
author: jareeq
Formats:
AL32UTF8 --> full utf
WE8MSWIN1252 --> no pl chars
WE8MSWIN1250 --> pl ansi
Limitations: this one is faster filing buffer once
* 2014/05/13 --> change to raw (end line character)
* 2016/02/12 --> text qualifier
--> header on/off
--> line length adjust
--> text q
* 2016/03/09 --> codepage
*****************************************************/
is
l_output utl_file.file_type;
l_theCursor integer default dbms_sql.open_cursor;
l_columnValue varchar2(32760);
l_status integer;
l_separator varchar2(10);
l_colCnt number default 0;
l_descTbl dbms_sql.desc_tab;
l_cnt number default 0;
l_buffer raw(32767) := null;
begin
-- Ensure that the named DIRECTORY object is defined
CREATE OR REPLACE DIRECTORY TMP_EXPORTS as p_dir;
l_output := utl_file.fopen(TMP_EXPORTS, p_filename, p_write_mode, 32760);
dbms_sql.parse(l_theCursor, p_query, dbms_sql.native);
-- write the header
if p_header
then
dbms_sql.describe_columns(l_theCursor, l_colCnt, l_descTbl);
l_separator := '';
for i in 1 .. l_colCnt
loop
if p_characterset = 'EE8MSWIN1250'
then
utl_file.put_raw(l_output,
utl_raw.cast_to_raw(l_separator ||
p_text_qualifier || l_descTbl(i)
.col_name || p_text_qualifier));
else
utl_file.put_raw(l_output,
utl_i18n.string_to_raw(l_separator ||
p_text_qualifier || l_descTbl(i)
.col_name ||
p_text_qualifier,
p_characterset));
end if;
dbms_sql.define_column(l_theCursor, i, l_columnValue, 32760);
l_separator := p_separator;
end loop;
utl_file.put_raw(l_output, utl_raw.cast_to_raw(p_eol_format));
end if;
-- exception handling
for i in 1 .. 255
loop
begin
dbms_sql.define_column(l_theCursor, i, l_columnValue, 32760);
l_colCnt := i;
exception
when others then
if (sqlcode = -1007)
then
exit;
else
raise;
end if;
end;
end loop;
dbms_sql.define_column(l_theCursor, 1, l_columnValue, 32760);
l_status := dbms_sql.execute(l_theCursor);
-- write the rows
loop
exit when(dbms_sql.fetch_rows(l_theCursor) <= 0);
l_separator := '';
for i in 1 .. l_colCnt
loop
dbms_sql.column_value(l_theCursor, i, l_columnValue);
if p_characterset = 'EE8MSWIN1250'
then
l_buffer := l_buffer ||
utl_raw.cast_to_raw(l_separator || p_text_qualifier ||
l_columnValue || p_text_qualifier);
else
l_buffer := l_buffer ||
utl_i18n.string_to_raw(l_separator || p_text_qualifier ||
l_columnValue ||
p_text_qualifier, p_characterset);
end if;
l_separator := p_separator;
end loop;
utl_file.put_raw(l_output,
l_buffer || utl_raw.cast_to_raw(p_eol_format));
l_buffer := '';
l_cnt := l_cnt + 1;
end loop;
dbms_sql.close_cursor(l_theCursor);
utl_file.fclose(l_output);
return l_cnt;
end dump_dsv_fast;
#!/bin/bash
# path with enough space to export your objects
WORKING_DIR_PATH="${WORKING_DIR_PATH:-/var/tmp/oracle-exports}"
GPG_RECIPIENT_KEY="${GPG_RECIPIENT_KEY:-PUT_THE_GPG_PUB_KEY_ID_HERE}"
# TODO could make this an input/choice
# if EDITOR isn't set then use vim
EDITOR="${EDITOR:-$(command -v vim)}"
# create working dir if needed, and change into it
mkdir -p $WORKING_DIR_PATH; cd $_
finish() {
[ -e $WORKING_DIR_PATH/oracle-exp.fifo ] && rm -v oracle-exp.fifo
echo;echo 'end of script.'
}
trap 'finish' EXIT
echo;
echo "INFO: working dir: $WORKING_DIR_PATH"
echo;echo;
echo 'This script is expected to be run interactively.'
echo 'TIP: start a tmux or screen session. This could be a very long running batch depending on data sizes.'
read -rsp $'Press any key to continue or Ctrl+C to exit...\n' -n1 key
echo
gpg_path=$(command -v gpg)
if [ ! -x "$gpg_path" ]; then
echo 'this script requires the program: gpg. please abort and fix this issue.' 1>&2
read -rsp $'Press Ctrl+C to abort...\n' -n1 key
echo
fi
sqlplus_path=$(command -v sqlplus)
if [ ! -x "$sqlplus_path" ]; then
echo 'this script requires the program: sqlplus. please abort and fix this issue.' 1>&2
read -rsp $'Press Ctrl+C to abort...\n' -n1 key
sqlplus_path='echo FIXME sqlplus is not in $PATH'
echo
fi
# use pigz (parallel gzip) where available
[ -x "$(command -v pigz)" ] && gzip_path=$(command -v pigz) || gzip_path=$(command -v gzip)
locale
echo; echo; echo 'Check the shell locale matches the data you are going to export!'
echo 'The commands that run will inherit the shell locale.'
# give the user a pause to consider their next step
read -rsp $'Press any key to continue or Ctrl+C to abort...\n' -n1 key
echo;
df -h .
echo; echo; echo 'Please check the storage space requirements are OK for path:' $(pwd)
# give the user a pause to consider their next step
read -rsp $'Press any key to continue or Ctrl+C to abort...\n' -n1 key
# download the export proc script
if [ ! -s create-export-proc.sql ]; then
echo
curl -LO https://gist.github.com/kyle0r/10501a2078a0b1cd198891e971db5dd9/raw/create-export-proc.sql
if [[ 0 != $? ]]; then
echo;echo 'something went wrong with the curl. please abort and fix this issue.' 1>&2
read -rsp $'Press Ctrl+C to abort...\n' -n1 key
fi
fi
# best practice, code review what you just downloaded before executing
echo; echo "your EDITOR ($EDITOR) will now open so you can code review the oracle proc script."
read -rsp $'Press any key to continue or Ctrl+C to abort...\n' -n1 key
$EDITOR create-export-proc.sql
# the table list to export
echo;echo "your EDITOR ($EDITOR) will now open so you can add/edit/delete the list of tables that should be exported (one per line)."
echo "Don't forget to _WRITE_ and quit when you are done."
read -rsp $'Press any key to continue or Ctrl+C to abort...\n' -n1 key
$EDITOR dbObjectList.txt
# username entry
if [ -z "$ORACLE_UN" ]; then
echo; read -r -p "enter oracle username: " ORACLE_UN; export ORACLE_UN; echo;
fi
# non-echo password entry
if [ -z "$ORACLE_PW" ]; then
echo; read -rs -p "enter oracle password (no echo): " ORACLE_PW; export ORACLE_PW; echo;
fi
# create the proc in oracle
$sqlplus_path -s ${ORACLE_UN}/${ORACLE_PW} @create-export-proc.sql
if [[ 0 != $? ]]; then
echo;echo 'something went wrong creating export proc. please abort and fix this issue.' 1>&2
read -rsp $'Press Ctrl+C to abort...\n' -n1 key
fi
# create a named pipe to ensure sensitive bytes are not written to persistent storage.
mkfifo oracle-exp.fifo
echo; echo;
# loop through the objects/tables to export
for dbObject in $(<dbObjectList.txt); do
echo "running export for $dbObject"
date --rfc-3339=ns
# start proc to gzip compress the byte stream, and pipe to gpg for encryption
( $gzip_path --best --stdout < oracle-exp.fifo | $gpg_path --trust-model always --encrypt -r $GPG_RECIPIENT_KEY > ${dbObject}_raw_extract.z.gpg ) &
# export the object/table from oracle, output to the named pipe
time $sqlplus_path -s ${ORACLE_UN}/${ORACLE_PW} <<EOF
exec dump_dsv_fast(
p_query => 'SELECT * FROM ${dbObject}'
,p_filename => 'oracle-exp.fifo'
,p_dir => '${WORKING_DIR_PATH}'
);
quit;
EOF
wait # for good measure, wait for the previous bg proc to finish with the fifo.
sync # for good measure ensure bytes are flushed to target inode
date --rfc-3339=ns
echo "iteration completed for object: $dbObject"
echo
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment