Skip to content

Instantly share code, notes, and snippets.

View neilkod's full-sized avatar

neil kodner neilkod

View GitHub Profile
@neilkod
neilkod / gist:1157004
Created August 19, 2011 15:08
bteq question
goal is to set an environment variable to the result of a teradata query
-bash-3.1$ cat test_bteq.cmd
.logon xxx.yyy.zzz.bbb/username,password
select count(*) from sql_class.orders;
.QUIT
I would like to know if its possible to do something like this
@neilkod
neilkod / bitmask.pig
Created July 18, 2011 21:08
interested to know how pig optimizes this statement
Does each MAX() get executed once or twice?
max_buckets = foreach grouped_buckets
generate group as reg
, MAX(buckets.is_drive_by) as is_drive_by
, MAX(buckets.involved) as is_involved
, MAX(buckets.engaged) as is_engaged
, MAX(buckets.is_drive_by) + MAX(buckets.involved) + MAX(buckets.engaged) as total;
@neilkod
neilkod / commands.pig
Created July 11, 2011 14:41
attempt at pig style
raw = LOAD 'hbase://user_info_helix'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'alias:helix profile:dw.last_sess_dt alias:*', '-loadKey')
AS (reg_method:chararray, helix_id:chararray, last_sess_dt:chararray, alias_map:map[]);
flattened = FOREACH raw
GENERATE reg_method as reg_method
, helix_id as helix_id
, last_sess_dt as last_sess_dt
, FLATTEN(mapToBag(alias_map)) as (dynamic_reg:chararray,session_time:chararray);
@neilkod
neilkod / aretheyankeeswinning.py
Created July 5, 2011 23:41
source code for aretheyankeeswinning.com
#!/usr/bin/env python
#
# Copyright 2007 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
pig 0.9 with PIG-1782
Data is being loaded from HBase.
I can dump the data to screen and write the data to HDFS ok. I am explicitly casting everything to a chararray before inserting back into HBase.
Trying to insert
new_helix_users: {ureg: chararray,helix_id: chararray,last_sess_dt: chararray,anon_map: map[]}
@neilkod
neilkod / my_weak_attempt_at_pyquery.py
Created June 24, 2011 02:29
working pyquery example
from pyquery import PyQuery
import urllib2
data = urllib2.urlopen('http://intermountainallergy.com/pollen.html')
d = PyQuery(data.read())
pollen_table = d('table.pollentable')
pollen_rows = pollen_table('tr')[1:-1]
for row in pollen_rows:
fields = row.getchildren()
<table width="514" cellpadding="3" cellspacing="0" class="pollentable">
<tbody>
<tr>
<td width="100"></td>
<td width="400"><img src="images/polen_graph.png" alt="" width="399" height="43" border="0"></td>
</tr>
<!--START-->
<tr>
<td><strong>Oak</strong></td>
<td align="left"><img src="http://www.intermountainallergy.com/images/gl.jpg" width="80" height="10" alt="bar in graph"></td>
@neilkod
neilkod / gist:1031579
Created June 17, 2011 14:57
spot check please
top ten voters
=== === ======
malcolm knipe 31
william platt 31
adrian cuthill 31
suresh iyer 32
henk vermeulen 32
yugant patra 34
eddie awad 34
balamohan manickam 35
@neilkod
neilkod / gist:1031536
Created June 17, 2011 14:35
oow votes
arjan kramer voted for kristina troutman 5 times
yugant patra voted for kristina troutman 5 times
sanjay singh voted for riyaj shamsudeen 5 times
xiaohuan xue voted for kristina troutman 5 times
henk vermeulen voted for alison coombe 5 times
alison coombe voted for alison coombe 5 times
paulo portugal voted for tariq farooq brainsurface 5 times
eric jacobs voted for kristina troutman 5 times
nate nelson voted for kristina troutman 5 times
eric grancher voted for riyaj shamsudeen 5 times
@neilkod
neilkod / date_problem.pig
Created June 14, 2011 14:46
I want pig to return "2011-06-14" instead of 2011 minus 6 minus 14
Problem: I'm trying to set a pig variable to the current date in YYYY-MM-DD format but pig interprets the YYYY-MM-DD as an expression and then solves it.
How can I coerce pig into accepting YYYY-MM-DD as a chararray? The cast operator isn't helping here.
watch:
-bash-3.1$ date +%Y\-%m\-%d
2011-06-14
# 2011 minus 6 minus 14