AbrahamJP tecmaverick

Summary

This quick document highlight how allow Amazon Redshift to load data from S3 bucket in another AWS account and possibly in different AWS Region

AWS Account A
region us-west-2 -> S3 bucket my.bucket

	$ sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
	$ sudo yum install http://rpms.remirepo.net/enterprise/remi-release-7.rpm
	$ sudo yum install yum-utils -y
	$ sudo yum update
	$ sudo yum remove php*
	$ sudo yum remove httpd*
	$ sudo yum clean all
	$ sudo yum upgrade -y
	$ sudo yum install httpd24
	$ sudo yum install php72 php72-mysqlnd php72-imap php72-pecl-memcache php72-pecl-apcu php72-gd

	sudo yum update -y

	#Python 2.7 installed
	python --version
	sudo yum -y install python36
	sudo yum install python3 -y

	#Install Node
	curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.0/install.sh \| bash
	. ~/.bashrc

	-- Different ways to generate sequential Integer values in Redshift

	-- Option 1: (The following statement is executed from leader node,
	-- hence any join operation from tables stored in compute node may cause query to fail.)
	SELECT * FROM generate_series(0,100);

	-- Generates sequence of integer by incrementing two
	SELECT * FROM generate_series(0,100,2);

	-- Generates positive sequence of integers upto 11,110

	SELECT integer '1' id;
	SELECT timestamp '2010-10-31 01:00:00' test_datetime;

	-- Concat integer to text. Outputs 12
	SELECT 1::text \|\| 2::text concat_demo

	-- ************************************************************
	-- Substract 10 days from 25 Jan 2018
	SELECT '2018-01-25'::date - interval '10 day' val

	-- ************************************************************
	-- Substract date via interval
	SELECT '2018-01-15'::date - interval '1 day' interval_val
	SELECT '2018-01-15'::date - interval '1 week' interval_val
	SELECT '2018-01-15'::date - interval '1 month' interval_val
	SELECT '2018-01-15'::date - interval '1 year' interval_val

	Redshift outputs the following error, when SQL syntax error is encountered.
	ERROR: syntax error at or near "," Position: 937

	In error message, position 937 denotes the character where the error is identified. To locate this,
	remove line feed from the SQL statement.

	Removing line feeds via Visual Studio Code.
	Paste the SQL statement, and open Find and Replace
	MAC: Option + Command + F
	Windows: Ctrl + H

	-- Solution to use 'dateadd' function to add years, instead of 'interval'
	WITH recursive numbers(n) as
	(
	SELECT 0 as n
	UNION all
	SELECT n + 1
	FROM numbers n
	WHERE n.n <= datediff(year,'2000-01-01'::date,trunc(current_date))
	),
	date_list as (

	Data 1.json

	{"fullname": "Paisley Hoover", "sex": "female", "address": "Third Court Dr. Windermere, FL34786"}
	{"fullname": "Paisley Hoover", "sex": "female", "address": "Third Court Dr. Windermere, FL34786"}
	{"fullname": "Paisley Hoover", "sex": "female", "address": "Third Court Dr. Windermere, FL34786"}

	Supported Schema
	val schema = StructType(List(
	StructField("fullname", StringType),
	StructField("sex", StringType),