Skip to content

Instantly share code, notes, and snippets.

@javierluraschi
Last active April 19, 2020 23:15
Show Gist options
  • Save javierluraschi/da1aa72c9275eabff73964c083d2c182 to your computer and use it in GitHub Desktop.
Save javierluraschi/da1aa72c9275eabff73964c083d2c182 to your computer and use it in GitHub Desktop.
Install Apache Arrow in Amazon EMR

Automated Install

EMR Configuration, replace <a-github-pat> with a valid PAT:

[{
  "configurations":[{
    "classification":"export",
    "properties":{"GITHUB_PAT":"<a-github-pat>"}
  }],
  "classification":"spark-env",
  "properties":{}
 }]

Bootstrap action and parameters:

s3://sparklyrec2/emr-sparklyr.sh
--rstudio --shiny --sparkr --arrow --rstudio-url https://s3.amazonaws.com/rstudio-ide-build/server/centos6/x86_64/rstudio-server-rhel-pro-1.2.981-2-x86_64.rpm --user-pw <a-password>

Manual Install

sudo yum install -y https://packages.red-data-tools.org/centos/red-data-tools-release-latest.noarch.rpm
sudo sed -i 's/\$releasever/6/g' /etc/yum.repos.d/red-data-tools.repo
sudo yum install -y --enablerepo=red-data-tools arrow-devel
@Hyurt
Copy link

Hyurt commented Feb 13, 2020

Note for manual installation which doesn't work anymore (tested on emr-5.26.0)

sudo yum install -y https://packages.red-data-tools.org/centos/red-data-tools-release-latest.noarch.rpm
sudo sed -i 's/\$releasever/6/g' /etc/yum.repos.d/red-data-tools-centos.repo
sudo yum install -y --enablerepo=red-data-tools-centos arrow-devel

sudo yum install -y https://packages.red-data-tools.org/centos/red-data-tools-release-latest.noarch.rpm generates 2 files in /etc/yum.repos.d/ : red-data-tools-centos.repo and red-data-tools-amazon-linux.repo

red-data-tools-amazon-linux.repo returns a 404 which doesn't helps, this is why I force red-data-tools-centos repo

@dgomesbr
Copy link

As per Arrow project, on EMR 6.0.0, just run:

  sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
  sudo yum install -y https://apache.bintray.com/arrow/centos/7/apache-arrow-release-latest.rpm
  sudo yum install -y --enablerepo=epel arrow-devel # For C++
  sudo yum install -y --enablerepo=epel arrow-glib-devel # For GLib (C)
  sudo yum install -y --enablerepo=epel parquet-devel # For Apache Parquet C++
  sudo yum install -y --enablerepo=epel parquet-glib-devel # For Parquet GLib (C)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment