EMR Configuration, replace <a-github-pat>
with a valid PAT:
[{
"configurations":[{
"classification":"export",
"properties":{"GITHUB_PAT":"<a-github-pat>"}
}],
"classification":"spark-env",
"properties":{}
}]
Bootstrap action and parameters:
s3://sparklyrec2/emr-sparklyr.sh
--rstudio --shiny --sparkr --arrow --rstudio-url https://s3.amazonaws.com/rstudio-ide-build/server/centos6/x86_64/rstudio-server-rhel-pro-1.2.981-2-x86_64.rpm --user-pw <a-password>
sudo yum install -y https://packages.red-data-tools.org/centos/red-data-tools-release-latest.noarch.rpm
sudo sed -i 's/\$releasever/6/g' /etc/yum.repos.d/red-data-tools.repo
sudo yum install -y --enablerepo=red-data-tools arrow-devel
As per Arrow project, on EMR 6.0.0, just run: