Tested with Apache Spark 2.1.0, Python 2.7.13 and Java 1.8.0_112
For older versions of Spark and ipython, please, see also previous version of text.
Tested with Apache Spark 2.1.0, Python 2.7.13 and Java 1.8.0_112
For older versions of Spark and ipython, please, see also previous version of text.
Kafka Binary files : http://kafka.apache.org/downloads.html
Atleast 2 AWS machines : AWS EMR or EC2 will be preferable
A Kafka Manager Utility to watch up the Cluster : https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
Use these rapid keyboard shortcuts to control the GitHub Atom text editor on macOS.
| import pandas as pd | |
| # http://blog.yhathq.com/static/misc/data/WineKMC.xlsx | |
| df_offers = pd.read_excel("./WineKMC.xlsx", sheetname=0) | |
| df_offers.columns = ["offer_id", "campaign", "varietal", "min_qty", "discount", "origin", "past_peak"] | |
| df_offers.head() | |
| df_transactions = pd.read_excel("./WineKMC.xlsx", sheetname=1) | |
| df_transactions.columns = ["customer_name", "offer_id"] | |
| df_transactions['n'] = 1 | |
| df_transactions.head() |
In the below keyboard shortcuts, I use the capital letters for reading clarity but this does not imply shift, if shift is needed, I will say shift. So ⌘ + D does not mean hold shift. ⌘ + Shift + D does of course.
| Function | Shortcut |
|---|---|
| New Tab | ⌘ + T |
| Close Tab or Window | ⌘ + W (same as many mac apps) |
| Go to Tab | ⌘ + Number Key (ie: ⌘2 is 2nd tab) |
| Go to Split Pane by Direction | ⌘ + Option + Arrow Key |
| # Time Series Testing | |
| import keras.callbacks | |
| from keras.models import Sequential | |
| from keras.layers.core import Dense, Activation, Dense, Dropout | |
| from keras.layers.recurrent import LSTM | |
| # Call back to capture losses | |
| class LossHistory(keras.callbacks.Callback): | |
| def on_train_begin(self, logs={}): | |
| self.losses = [] |
| from sklearn.datasets import fetch_20newsgroups | |
| from sklearn.feature_extraction.text import TfidfVectorizer | |
| from sklearn.linear_model import LogisticRegression | |
| from sklearn.pipeline import Pipeline | |
| import pandas as pd | |
| import numpy as np | |
| # Grab just two categories from the 20 newsgroups dataset | |
| categories=['sci.space', 'rec.autos'] |
bin/kafka-topics.sh --zookeeper localhost:2181 --list
bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic mytopic --config retention.ms=1000
... wait a minute ...
bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic mytopic --delete-config retention.ms
| from cassandra.cluster import Cluster | |
| from cassandra.auth import PlainTextAuthProvider | |
| import pandas as pd | |
| def pandas_factory(colnames, rows): | |
| return pd.DataFrame(rows, columns=colnames) | |
| cluster = Cluster( | |
| contact_points=['127.0.0.1'], | |
| auth_provider = PlainTextAuthProvider(username='cassandra', password='cassandra') |
| import nltk | |
| import gensim | |
| sample="""Renewed fighting has broken out in South Sudan between forces loyal to the president and vice-president. A reporter in the capital, Juba, told the BBC gunfire and large explosions could be heard all over the city; he said heavy artillery was being used. More than 200 people are reported to have died in clashes since Friday. The latest violence came hours after the UN Security Council called on the warring factions to immediately stop the fighting. In a unanimous statement, the council condemned the violence "in the strongest terms" and expressed "particular shock and outrage" at attacks on UN sites. It also called for additional peacekeepers to be sent to South Sudan. | |
| Chinese media say two Chinese UN peacekeepers have now died in Juba. Several other peacekeepers have been injured, as well as a number of civilians who have been caught in crossfire. The latest round of violence erupted when troops loyal to President Salva Kiir and first Vice-President Riek Machar began sho |