This quick document highlight how allow Amazon Redshift to load data from S3 bucket in another AWS account and possibly in different AWS Region
AWS Account A
region us-west-2 -> S3 bucket my.bucket
| $ sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm | |
| $ sudo yum install http://rpms.remirepo.net/enterprise/remi-release-7.rpm | |
| $ sudo yum install yum-utils -y | |
| $ sudo yum update | |
| $ sudo yum remove php* | |
| $ sudo yum remove httpd* | |
| $ sudo yum clean all | |
| $ sudo yum upgrade -y | |
| $ sudo yum install httpd24 | |
| $ sudo yum install php72 php72-mysqlnd php72-imap php72-pecl-memcache php72-pecl-apcu php72-gd |
| sudo yum update -y | |
| #Python 2.7 installed | |
| python --version | |
| sudo yum -y install python36 | |
| sudo yum install python3 -y | |
| #Install Node | |
| curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.0/install.sh | bash | |
| . ~/.bashrc |
| -- Different ways to generate sequential Integer values in Redshift | |
| -- Option 1: (The following statement is executed from leader node, | |
| -- hence any join operation from tables stored in compute node may cause query to fail.) | |
| SELECT * FROM generate_series(0,100); | |
| -- Generates sequence of integer by incrementing two | |
| SELECT * FROM generate_series(0,100,2); | |
| -- Generates positive sequence of integers upto 11,110 |
| SELECT integer '1' id; | |
| SELECT timestamp '2010-10-31 01:00:00' test_datetime; | |
| -- Concat integer to text. Outputs 12 | |
| SELECT 1::text || 2::text concat_demo | |
| -- ************************************************************ | |
| -- Substract 10 days from 25 Jan 2018 | |
| SELECT '2018-01-25'::date - interval '10 day' val | |
| -- ************************************************************ | |
| -- Substract date via interval | |
| SELECT '2018-01-15'::date - interval '1 day' interval_val | |
| SELECT '2018-01-15'::date - interval '1 week' interval_val | |
| SELECT '2018-01-15'::date - interval '1 month' interval_val | |
| SELECT '2018-01-15'::date - interval '1 year' interval_val |
| Redshift outputs the following error, when SQL syntax error is encountered. | |
| ERROR: syntax error at or near "," Position: 937 | |
| In error message, position 937 denotes the character where the error is identified. To locate this, | |
| remove line feed from the SQL statement. | |
| Removing line feeds via Visual Studio Code. | |
| Paste the SQL statement, and open Find and Replace | |
| MAC: Option + Command + F | |
| Windows: Ctrl + H |
| -- Solution to use 'dateadd' function to add years, instead of 'interval' | |
| WITH recursive numbers(n) as | |
| ( | |
| SELECT 0 as n | |
| UNION all | |
| SELECT n + 1 | |
| FROM numbers n | |
| WHERE n.n <= datediff(year,'2000-01-01'::date,trunc(current_date)) | |
| ), | |
| date_list as ( |
| with ten as ( | |
| select 0 as number union | |
| select 1 union | |
| select 2 union | |
| select 3 union | |
| select 4 union | |
| select 5 union | |
| select 6 union | |
| select 7 union | |
| select 8 union |
| Data 1.json | |
| {"fullname": "Paisley Hoover", "sex": "female", "address": "Third Court Dr. Windermere, FL34786"} | |
| {"fullname": "Paisley Hoover", "sex": "female", "address": "Third Court Dr. Windermere, FL34786"} | |
| {"fullname": "Paisley Hoover", "sex": "female", "address": "Third Court Dr. Windermere, FL34786"} | |
| Supported Schema | |
| val schema = StructType(List( | |
| StructField("fullname", StringType), | |
| StructField("sex", StringType), |