monkut · February 6, 2023 07:06 · yaroslav-serhiichuk · Mar 11, 2018 · RARain · Nov 23, 2018
diff --git a/kafka-ubuntu16.04-install.rst b/kafka-ubuntu16.04-install.rst
 # referecing:
 # https://www.digitalocean.com/community/tutorials/how-to-install-apache-kafka-on-ubuntu-14-04
 # https://chongyaorobin.wordpress.com/2015/07/08/step-by-step-of-install-apache-kafka-on-ubuntu-standalone-mode/


 1. Add 'kafka' user::

  $ sudo useradd kafka -m

 2.   Install Java::

  $ sudo apt-get update
  $ sudo apt-get install default-jre
  
 3. Install zookeeper::

  $ sudo apt-get install zookeeperd
  
  .. note::
  
    After the installation completes, ZooKeeper will be started as a daemon automatically. By default, it will listen on port 2181.
    
 4. Confirm zookeeper is running on expected port::

  $ telnet localhost 2181
  Trying ::1...
  Connected to localhost.
  Escape character is '^]'.
  ruok <-- Type at empty prompt!
  imokConnection closed by foreign host.
  
  .. note::
  
    if after typing 'ruok' once connected to 'localhost', zookeeper will respond with 'imok' and close the session.
    
 5. Download kafka from http://kafka.apache.org/downloads.html::

  # with cntlm proxy installed and running if necessary
  $ export http_proxy=http://127.0.0.1:8009
  $ export https_proxy=http://127.0.0.1:8009
  # grab latest stable
  $ wget http://ftp.jaist.ac.jp/pub/apache/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz

 6. untar and move binaries to /usr/local/kafka::

  $ tar xvf kafka_2.11-0.10.0.0.tgz
  $ sudo mv kafka_2.11-0.10.0.0 /usr/local/kafka
  
 7. Configure Kafka Server::
  
  # turn on topic delete
  $ vi /usr/local/kafka/config/server.properites
  
  #>> At end of file add:
  delete.topic.enable = true
  
  # save and quit
  
 8.  Test Server::

  $ /usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
  ...
  [2016-08-06 01:22:00,000] INFO [Kafka Server 0], started (kafka.server.KafkaServer)

  .. note::
  
    This only starts the server temporarily for intial testing, the service should be registered later...
    
 9. With the kafka sever running, open another session, and create a topic::

  $ /usr/local/kafka/bin/kafka-topics.sh --create --topic topic-test --zookeeper localhost:2181 --partitions 1 --replication-factor 1
  Created topic "topic-test".
  
 10. List available topics::

  $ /usr/local/kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181
  topic-test
  
  .. note::
  
    You should see the created, 'topic-test' topic listed.
    
 11. Send message to topic as a producer via the 'kafka-console-producer.sh'::
  
    echo "hello world" | /usr/local/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic topic-test
    
 12. *Consume* the send message::

   $ /usr/local/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic topic-test --from-beginning
   
  .. note::
  
    The '--from-beginning' flag given to start a consumer with the earliest message present in the log, rather than the latest message. (see */usr/local/kafka/bin/kafka-console-consumer.sh* help for more option details)
    
  


 ----
 # for install of scala (sbt): http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html
 Spark installation

 1. Install Scala Build Tool (sbt) [Make sure https_proxy is set if needed]::

 1.1 get Scala Build Tool ubuntu repository info::
  
  wget https://dl.bintray.com/sbt/debian/sbt-0.13.11.deb

 1.2 Install sbt repostory info::

  sudo dpkg -i sbt-0.13.11.deb
  
 1.3 Update repository info and install 'sbt::

  sudo apt-get update
  sudo apt-get install sbt
  
 2. download spark binary (Grab latest stable from: http://spark.apache.org/downloads.html)::

  wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.0-bin-hadoop2.7.tgz
  
 3. untar and move::

  tar xvf spark-2.0.0-bin-hadoop2.7.tgz
  sudo mv spark-2.0.0-bin-hadoop2.7 /usr/local/spark



  
 4. Add spark configuraiton to your profile (or appropriate ENV configuration)::

  vi ~/.profile
  (Add the following to .profile)
  # set PATH so it includes user's private bin directories
  PATH="/usr/local/spark/bin:$HOME/bin:$HOME/.local/bin:$PATH"
  export PYSPARK_PYTHON=python3

 5. Apply to current ENV::

  source ~/.profile

 5. Test configuration::

  pyspark
  
  --> Should open the pyspark console
	# referecing:
	# https://www.digitalocean.com/community/tutorials/how-to-install-apache-kafka-on-ubuntu-14-04
	# https://chongyaorobin.wordpress.com/2015/07/08/step-by-step-of-install-apache-kafka-on-ubuntu-standalone-mode/


	1. Add 'kafka' user::

	$ sudo useradd kafka -m

	2. Install Java::

	$ sudo apt-get update
	$ sudo apt-get install default-jre

	3. Install zookeeper::

	$ sudo apt-get install zookeeperd

	.. note::

	After the installation completes, ZooKeeper will be started as a daemon automatically. By default, it will listen on port 2181.

	4. Confirm zookeeper is running on expected port::

	$ telnet localhost 2181
	Trying ::1...
	Connected to localhost.
	Escape character is '^]'.
	ruok <-- Type at empty prompt!
	imokConnection closed by foreign host.

	.. note::

	if after typing 'ruok' once connected to 'localhost', zookeeper will respond with 'imok' and close the session.

	5. Download kafka from http://kafka.apache.org/downloads.html::

	# with cntlm proxy installed and running if necessary
	$ export http_proxy=http://127.0.0.1:8009
	$ export https_proxy=http://127.0.0.1:8009
	# grab latest stable
	$ wget http://ftp.jaist.ac.jp/pub/apache/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz

	6. untar and move binaries to /usr/local/kafka::

	$ tar xvf kafka_2.11-0.10.0.0.tgz
	$ sudo mv kafka_2.11-0.10.0.0 /usr/local/kafka

	7. Configure Kafka Server::

	# turn on topic delete
	$ vi /usr/local/kafka/config/server.properites

	#>> At end of file add:
	delete.topic.enable = true

	# save and quit

	8. Test Server::

	$ /usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
	...
	[2016-08-06 01:22:00,000] INFO [Kafka Server 0], started (kafka.server.KafkaServer)

	.. note::

	This only starts the server temporarily for intial testing, the service should be registered later...

	9. With the kafka sever running, open another session, and create a topic::

	$ /usr/local/kafka/bin/kafka-topics.sh --create --topic topic-test --zookeeper localhost:2181 --partitions 1 --replication-factor 1
	Created topic "topic-test".

	10. List available topics::

	$ /usr/local/kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181
	topic-test

	.. note::

	You should see the created, 'topic-test' topic listed.

	11. Send message to topic as a producer via the 'kafka-console-producer.sh'::

	echo "hello world" \| /usr/local/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic topic-test

	12. Consume the send message::

	$ /usr/local/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic topic-test --from-beginning

	.. note::

	The '--from-beginning' flag given to start a consumer with the earliest message present in the log, rather than the latest message. (see /usr/local/kafka/bin/kafka-console-consumer.sh help for more option details)




	----
	# for install of scala (sbt): http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html
	Spark installation

	1. Install Scala Build Tool (sbt) [Make sure https_proxy is set if needed]::

	1.1 get Scala Build Tool ubuntu repository info::

	wget https://dl.bintray.com/sbt/debian/sbt-0.13.11.deb

	1.2 Install sbt repostory info::

	sudo dpkg -i sbt-0.13.11.deb

	1.3 Update repository info and install 'sbt::

	sudo apt-get update
	sudo apt-get install sbt

	2. download spark binary (Grab latest stable from: http://spark.apache.org/downloads.html)::

	wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.0-bin-hadoop2.7.tgz

	3. untar and move::

	tar xvf spark-2.0.0-bin-hadoop2.7.tgz
	sudo mv spark-2.0.0-bin-hadoop2.7 /usr/local/spark




	4. Add spark configuraiton to your profile (or appropriate ENV configuration)::

	vi ~/.profile
	(Add the following to .profile)
	# set PATH so it includes user's private bin directories
	PATH="/usr/local/spark/bin:$HOME/bin:$HOME/.local/bin:$PATH"
	export PYSPARK_PYTHON=python3

	5. Apply to current ENV::

	source ~/.profile

	5. Test configuration::

	pyspark

	--> Should open the pyspark console