Skip to content

Instantly share code, notes, and snippets.

@doppiomacchiatto
doppiomacchiatto / sentiment_classification.py
Created July 19, 2017 19:54 — forked from bonzanini/sentiment_classification.py
Sentiment analysis with scikit-learn
# You need to install scikit-learn:
# sudo pip install scikit-learn
#
# Dataset: Polarity dataset v2.0
# http://www.cs.cornell.edu/people/pabo/movie-review-data/
#
# Full discussion:
# https://marcobonzanini.wordpress.com/2015/01/19/sentiment-analysis-with-python-and-scikit-learn

Count stats for twitter stream and store in Cassandra

cd $SPARK_HOME

/bin/spark-submit --packages TargetHolding/pyspark-cassandra:0.3.5 /Users/drehman/Apps/workspace/spark_cassandra_stream_example.py

python twitter_rolling_count.py -q data -d data 2>&1 | nc -lk 10.0.0.235 9999
@doppiomacchiatto
doppiomacchiatto / ecs.json
Created June 28, 2017 02:00 — forked from caevyn/ecs.json
ecs definition
{
"taskDefinitionArn": "arn:aws:ecs:us-west-2:<scc number>:task-definition/build-blog:3",
"revision": 3,
"containerDefinitions": [
{
"volumesFrom": [],
"portMappings": [],
"command": [],
"environment": [
{
@doppiomacchiatto
doppiomacchiatto / checkDockerDisks.sh
Created June 23, 2017 15:44 — forked from robsonke/checkDockerDisks.sh
This Bash script will loop through all running docker containers on a host and list the disk usage per mount. In case it's breaching the 65%, it will email you.
#!/bin/bash
# get all running docker container names
containers=$(sudo docker ps | awk '{if(NR>1) print $NF}')
host=$(hostname)
# loop through all containers
for container in $containers
do
echo "Container: $container"
import nltk
with open('sample.txt', 'r') as f:
sample = f.read()
sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True)
# Useful links:
# http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html
mkdir zookeeper
cd zookeeper
curl http://mirror.olnevhost.net/pub/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz | tar xz
ln -s zookeeper-3.4.6/ latest
cat <<EOF > latest/conf/zoo.cfg
tickTime=2000
dataDir=$(pwd)/latest
clientPort=2181
# Copyright (c) 2012-2013 Mitch Garnaat http://garnaat.org/
# Copyright 2012-2014 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"). You
# may not use this file except in compliance with the License. A copy of
# the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "license" file accompanying this file. This file is
@doppiomacchiatto
doppiomacchiatto / role_arn_to_session.py
Created March 28, 2017 12:55 — forked from gene1wood/role_arn_to_session.py
Simple python function to assume an AWS IAM Role from a role ARN and return a boto3 session object
import boto3
def role_arn_to_session(**args):
"""
Usage :
session = role_arn_to_session(
RoleArn='arn:aws:iam::012345678901:role/example-role',
RoleSessionName='ExampleSessionName')
client = session.client('sqs')
"""
apply plugin: 'java'
apply plugin: 'spring-boot'
apply plugin: 'eclipse'
apply plugin: 'idea'
// mainClass = 'org.liqweed.boot.ServerStarter'
ext {
springVersion = '4.0.3.RELEASE'
springBootVersion = '1.0.1.RELEASE'
@doppiomacchiatto
doppiomacchiatto / ls.py
Created February 24, 2017 17:38 — forked from jbeezley/ls.py
Recursively list files in s3
#!/usr/bin/env python
import sys
import json
from boto.s3.connection import S3Connection
from boto.s3.prefix import Prefix
from boto.s3.key import Key
bucketname = sys.argv[1]
delimiter = '/'