Skip to content

Instantly share code, notes, and snippets.

@alexwoolford
alexwoolford / pom.xml
Created July 4, 2015 23:33
Shaded jar for cascading test
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>woolford.io</groupId>
<artifactId>cascading-test</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
@alexwoolford
alexwoolford / build_cookie_url_graph.py
Created July 3, 2015 18:24
Python script to build a Neo4j graph from MySQL data
@alexwoolford
alexwoolford / get_alchemy_taxonomies.py
Created June 29, 2015 20:15
A quick & dirty script to get taxonomy data from Alchemy API for a list of URL's
#!/usr/bin/env python
import urllib2
import urllib
import logging
import sys
import json
root = logging.getLogger()
root.setLevel(logging.DEBUG)
0063A85D-0926-491A-9ADB-43192934D4F8ib.adnxs.com
0063A935-EE2E-4076-9518-1B9C75D32CADwww.globo.com
0063AB85-F3E8-42FD-8097-5BA5503150A8live.sekindo.com
0063AC72-FA38-480E-9F05-3532153164F2g.adnxs.com
0063AF19-8FF9-47D8-994D-6E7EE45D7231cdn-static.liverail.com
0063AFE8-64DD-45FE-A5D9-F1983EB7CB63www.baking-cakes.net
0063B1D7-5A27-492A-BF54-F3C809C0222Dthedailybanter.com
0063B3A9-A8F4-4618-B06D-A03A80093AB2b4.arcadeyum.com
0063B3A9-A8F4-4618-B06D-A03A80093AB2b4.playtopus.com
0063B3FD-2FF5-4EBC-8BBC-073DCA98BC3Dwww.inspiredmessiness.com
@alexwoolford
alexwoolford / arimaUpperLimitAlert
Created May 28, 2015 22:50
ARIMA upper limit alerter
#!/usr/bin/env r
#######################################################################################################################
#
# This script was inspired by this post:
# http://stats.stackexchange.com/questions/152644/what-algorithm-should-i-use-to-detect-anomalies-on-time-series
#
# It creates a timeseries model for all the previous periods except the most recent one, calculates confidence
# intervals for alerts amd warnings, and then returns:
# 2: most recent value falls outside the alert upper limit
# 1: most recent value falls outside the warning upper limit
# In[1]:
import re
import datetime
# In[2]:
log = """[23:40:45]ACCESS: Login: StaffMember/(StaffMember)
[23:41:09]ACCESS: Logout: *no key*/(StaffMember)
[23:41:09]ACCESS: Login: StaffMember/(John Smith)
@alexwoolford
alexwoolford / hive_e_set.txt
Last active August 29, 2015 14:15
output from hive -e 'set;'
datanucleus.autoCreateSchema=true
datanucleus.autoStartMechanismMode=checked
datanucleus.cache.level2=false
datanucleus.cache.level2.type=none
datanucleus.connectionPoolingType=BONECP
datanucleus.fixedDatastore=false
datanucleus.identifierFactory=datanucleus1
datanucleus.plugin.pluginRegistryBundleCheck=LOG
datanucleus.rdbms.useLegacyNativeValueStrategy=true
datanucleus.storeManagerType=rdbms
@alexwoolford
alexwoolford / hadoop_conf-details_print-all-effective-properties.xml
Created February 20, 2015 05:59
The output from "hadoop conf-details print-all-effective-properties"
<?xml version="1.0" encoding="UTF-8" standalone="no"?><configuration>
<property><name>mapreduce.job.ubertask.enable</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>yarn.resourcemanager.delayed.delegation-token.removal-interval-ms</name><value>30000</value><source>yarn-default.xml</source></property>
<property><name>yarn.resourcemanager.max-completed-applications</name><value>10000</value><source>yarn-default.xml</source></property>
<property><name>io.bytes.per.checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>yarn.timeline-service.leveldb-timeline-store.read-cache-size</name><value>104857600</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.client.submit.file.replication</name><value>10</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.shuffle.connection-keep-alive.enable</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>yarn.node
@alexwoolford
alexwoolford / hive-site.xml
Last active August 29, 2015 14:15
/opt/mapr/hive/hive-0.13/conf/hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@alexwoolford
alexwoolford / mapr-installer.log
Last active August 29, 2015 14:15
output from /opt/mapr-installer/var/mapr-installer.log
awoolford@hadoop01:~$ cat /opt/mapr-installer/var/mapr-installer.log
2015-02-12 10:37:03,968 mapr-install 139 [INFO]:
2015-02-12 10:37:03,968 mapr-install 140 [INFO]: ================================
2015-02-12 10:37:03,968 mapr-install 141 [INFO]: Installer Version: 4.0.2.136 started
2015-02-12 10:37:03,971 common 398 [INFO]: Now querying package python-pycurl
2015-02-12 10:37:03,989 common 403 [INFO]: Package: python-pycurl
Status: install ok installed
Priority: optional
Section: python
Installed-Size: 215