Skip to content

Instantly share code, notes, and snippets.

@ndamulelonemakh
Last active May 26, 2022 00:17
Show Gist options
  • Select an option

  • Save ndamulelonemakh/ca9b7d93f1e0700d2e3c711bfbbfb8c3 to your computer and use it in GitHub Desktop.

Select an option

Save ndamulelonemakh/ca9b7d93f1e0700d2e3c711bfbbfb8c3 to your computer and use it in GitHub Desktop.
Miscellaneous scripts for installing big data tools for learning workloads
#!/bin/bash
# Scipt tested on Ubuntu 18.04
# i. Make sure java is installed
# If you intend to use hive, use java-8
sudo apt-get update -y
sudo apt-get install openjdk-8-jdk -y
# 2. Now we can install hadoop
cd /tmp
DOWNLOAD_URL=https://archive.apache.org/dist/hadoop/core/hadoop-3.3.1/hadoop-3.3.1.tar.gz
wget $DOWNLOAD_URL
tar -xvf $(basename $DOWNLOAD_URL)
mv -v hadoop-3.3.1 /usr/local/hadoop
# To make sure that these settings are persisted after your shell session ends,
# Edit $HOME/.bashrc file and append the following statements
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre # # Usualy java is installed here in ubuntu 18.04
export HADOOP_HOME=/usr/local/hadoop
export PATH=${HADOOP_HOME}/bin:${PATH}
export HADOOP_CLASSPATH=`hadoop classpath`
export PATH=${HIVE_HOME}/bin:${PATH}
export CLASSPATH=$CLASSPATH:$(hadoop classpath)
# Reload the shell session or type:
. ~/.bashrc
# Then verify your installation
java version
hadoop --version
# iii. Post install configuration
# Set up a dedicated user for running hadoop + make sure you setup ssh to allow password-less login for the new user
# sudo apt-get install -y ssh
# sudo apt-get install -y pdsh
# sudo adduser hadoop
# source: https://ocrmypdf.readthedocs.io/en/latest/installation.html#installing-on-windows
sudo apt install ocrmypdf
# Install tessarect with support for all languages
# source: https://linuxhint.com/install-tesseract-ocr-linux
sudo apt install tesseract-ocr-all -y
sudo apt install imagemagick
tesseract <image_name> <output file_name> # usage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment