EMR Configuration, replace <a-github-pat>
with a valid PAT:
[{
"configurations":[{
"classification":"export",
"properties":{"GITHUB_PAT":"<a-github-pat>"}
}],
"classification":"spark-env",
"properties":{}
}]
Bootstrap action and parameters:
s3://sparklyrec2/emr-sparklyr.sh
--rstudio --shiny --sparkr --arrow --rstudio-url https://s3.amazonaws.com/rstudio-ide-build/server/centos6/x86_64/rstudio-server-rhel-pro-1.2.981-2-x86_64.rpm --user-pw <a-password>
sudo yum install -y https://packages.red-data-tools.org/centos/red-data-tools-release-latest.noarch.rpm
sudo sed -i 's/\$releasever/6/g' /etc/yum.repos.d/red-data-tools.repo
sudo yum install -y --enablerepo=red-data-tools arrow-devel
Note for manual installation which doesn't work anymore (tested on emr-5.26.0)
sudo yum install -y https://packages.red-data-tools.org/centos/red-data-tools-release-latest.noarch.rpm
generates 2 files in/etc/yum.repos.d/
:red-data-tools-centos.repo
andred-data-tools-amazon-linux.repo
red-data-tools-amazon-linux.repo
returns a 404 which doesn't helps, this is why I forcered-data-tools-centos
repo