Skip to content

Instantly share code, notes, and snippets.

View royseto's full-sized avatar

Roy Seto royseto

View GitHub Profile
The following is what I got after merging the latest commit (fa9ee2025870119260770453cd11deae2baa0a12) from https://github.com/Cascading/CoPA this evening (2013-02-03). On my system at least, the Gradle build doesn't find the correct dependencies with Cascading 2.1.x, but it builds with Cascading 2.0.x.
roy-setos-macbook:~ royseto$ java -version
java version "1.7.0_11"
Java(TM) SE Runtime Environment (build 1.7.0_11-b21)
Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
roy-setos-macbook:~ royseto$ brew ls --versions | grep hadoop
hadoop 1.1.1 hdfs
roy-setos-macbook:~ royseto$ gradle -version
@royseto
royseto / royseto-CoPA-lein-build-logs.txt
Last active December 12, 2015 03:18
Log of my leiningen build for my fork of git://github.com/Cascading/CoPA.git
This pair of transcripts shows the Leiningen build of
git://github.com/Cascading/CoPA.git (Cascalog version) on my system, a MacBook
Pro running OS X 10.8.2. The first transcript shows that the build fails using
Leiningen 1.7.1 and the second shows that it succeeds using Leiningen 2.0.0. I
installed Leiningen using Homebrew (http://mxcl.github.com/homebrew/).
I tested running the Hadoop job and confirmed it runs correctly, though this
is not shown in the transcript.
@royseto
royseto / core.clj
Created March 23, 2013 08:43
Example output for Cascalog hfs-delimited with :strict? false
(ns example.core
(use [cascalog.api]
[cascalog.more-taps :only (hfs-delimited)])
(:gen-class))
(defn -main [in & args]
(?<- (stdout)
[?a ?b]
((hfs-delimited in :delimiter "|" :strict? false) ?a ?b)))
@royseto
royseto / core.clj
Created March 24, 2013 06:10
Build log of a simple project depending on Cascalog 1.10.1
(ns impatient.core
(:use [cascalog.api]
[cascalog.more-taps :only (hfs-delimited)])
(:gen-class))
(defn -main [in out & args]
(?<- (hfs-delimited out)
[?doc ?line]
((hfs-delimited in :skip-header? true) ?doc ?line)))
@royseto
royseto / 1 - solution using ctid and GROUP BY
Last active December 16, 2015 18:19
Dedup a relational table using SQL delete (tested with PostgreSQL)
roy-setos-macbook:~ royseto$ ssh -i ~/.ec2/rs-keypair-2.pem [email protected]
Welcome to Postgresql, TurnKey Linux 12.0 / Debian 6.0.5 Squeeze
System information (as of Sun Apr 28 09:57:02 2013)
System load: 0.00 Memory usage: 41%
Processes: 62 Swap usage: 0%
Usage of /: 7.1% of 9.84GB IP address for eth0: 10.197.60.2
TKLBAM (Backup and Migration): NOT INITIALIZED
@royseto
royseto / config.yml
Created June 19, 2014 20:58
Local Docker registry config.yml
# The `common' part is automatically included (and possibly overriden by all
# other flavors)
common:
loglevel: _env:LOGLEVEL:debug
storage_redirect: _env:STORAGE_REDIRECT
standalone: true
index_endpoint: _env:INDEX_ENDPOINT
disable_token_auth: _env:DISABLE_TOKEN_AUTH
privileged_key: _env:PRIVILEGED_KEY
@royseto
royseto / example-bwx-game-session.txt
Created January 3, 2015 22:41
example bwx-game session (with crash unfortunately)
$ ./bwx-game.py
--=( You are in the Sidewalk )=--
There is a large glass door to the east.
The sign says 'Come In!'
There are a few things here: Gary the garden gnome, a pebble, and a key.
Robby is here.
A cat is here.
Fido is here.
# Experiments with R aggregates and window functions
# See http://cran.r-project.org/web/packages/dplyr/vignettes/window-functions.html
library(dplyr)
data(iris)
iris_pct <- iris %>%
mutate(length_pct = Petal.Length / sum(Petal.Length))
iris_pct2 <- iris %>%
# Prototype for transforming a set of time intervals with arrival and departure
# timestamps into utilization by hour
library(lubridate)
# Period start and end
period_start <- ymd("2015-04-01")
period_end <- ymd("2015-05-01")
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.