Skip to content

Instantly share code, notes, and snippets.

cascading.tuple.TupleException: operation added the wrong number of fields, expected: ['?pubdate', '?url', '?eml', '?dwca', '?title', '?icode', '?description', '?contact', '?orgname', '?email', '?rights-extra', '?icode-extra', '?count', '?id', '?associatedmedia', '?associatedoccurrences', '?associatedreferences', '?associatedsequences', '?associatedtaxa', '?basisofrecord', '?bed', '?behavior', '?catalognumber', '?collectioncode', '?collectionid', '?continent', '?coordinateprecision', '?coordinateuncertaintyinmeters', '?country', '?countrycode', '?county', '?datageneralizations', '?dateidentified', '?day', '?decimallatitude', '?decimallongitude', '?disposition', '?earliestageorloweststage', '?earliesteonorlowesteonothem', '?earliestepochorlowestseries', '?earliesteraorlowesterathem', '?earliestperiodorlowestsystem', '?enddayofyear', '?establishmentmeans', '?eventattributes', '?eventdate', '?eventid', '?eventremarks', '?eventtime', '?fieldnotes', '?fieldnumber', '?footprintspatialfit', '?footprintwkt', '?format
Exception in thread "main" java.lang.UnsupportedClassVersionError: com/cartodb/impl/SecuredCartoDBClient : Unsupported major.minor version 51.0, compiling:(core.clj:1)
at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3387)
at clojure.lang.Compiler.compile1(Compiler.java:7035)
at clojure.lang.Compiler.compile1(Compiler.java:7025)
at clojure.lang.Compiler.compile(Compiler.java:7097)
at clojure.lang.RT.compile(RT.java:387)
at clojure.lang.RT.load(RT.java:427)
at clojure.lang.RT.load(RT.java:400)
at clojure.core$load$fn__4890.invoke(core.clj:5415)
at clojure.core$load.doInvoke(core.clj:5414)
@robinkraft
robinkraft / gist:5424034
Created April 19, 2013 23:53
Investigating issue where optional fields aren't getting serialized. The optional param-break field is getting lost on serialization. Making it required solves the problem.
(use 'forma.hadoop.jobs.runner-test)
(in-ns 'forma.hadoop.jobs.runner-test)
(def ts-dc
"FormaValues wrapped in a TimeSeries object wrapped in a DataChunk"
(let [[forma-vals] (series->forma-values nil [1. 2.] [2. 3.] [3. 4.] [4. 5.])
loc (apply thrift/ModisPixelLocation* pix-loc)
ts (thrift/TimeSeries* 827 forma-vals)
dc (thrift/DataChunk* "trends" loc ts t-res)]
dc))
@robinkraft
robinkraft / gist:5383245
Created April 14, 2013 16:11
Elastic MapReduce command generated by lein-emr
elastic-mapreduce --create --alive --name dev --availability-zone us-east-1d --ami-version 2.0.5 --instance-group master --instance-type m2.4xlarge --instance-count 1 --instance-group core --instance-type m2.4xlarge --instance-count 5 --enable-debugging --bid-price 0.75 --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configurations/latest/memory-intensive --bootstrap-action s3://elasticmapreduce/bootstrap-actions/add-swap --args 2048 --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop --args "--site-config-file,s3://reddconfig/bootstrap-actions/config_new.xml,-s,mapred.reduce.tasks=120,-s,mapred.tasktracker.map.tasks.maximum=30,-s,mapred.tasktracker.reduce.tasks.maximum=24" --bootstrap-action s3://reddconfig/bootstrap-actions/forma_bootstrap_robin.sh
@robinkraft
robinkraft / gist:5383160
Created April 14, 2013 15:50
latest update bash script - used for update from 9/29/12 to 2/2/13
#! /bin/bash
################
# Configurable #
################
SRES="500"
TRES="16"
#YEAR=$(date +%Y) # no longer used
MODISLAYERS="[:ndvi]" # :reli
@robinkraft
robinkraft / gist:5349286
Last active December 16, 2015 00:39
Create a random image the size of a MODIS tile using Numpy and GDAL. N.B. this script creates tile 28 8.
from osgeo import gdal, gdal_array, osr
import numpy
from osgeo.gdalconst import GDT_Byte
xsize, ysize = 4800, 4800
a = numpy.random.randint(0, 100, (xsize, ysize)).astype(numpy.int8)
xmin, xmax = 100, 110.
ymin, ymax = 0, 10.
@robinkraft
robinkraft / sampledata.txt
Created April 4, 2013 01:11
Sample queries showing problem with Vertnet wide source using Cascalog predicate operator :>> Check out this diff for relevant recent changes to project: https://github.com/VertNet/gulo/compare/feature/stats-queries...feature/new-stats-queries
Fri, 15 Jun 2012 16:05:03 -0500 http://ipt.vertnet.org:8080/ipt/resource.do?r=isu_mammals http://ipt.vertnet.org:8080/ipt/eml.do?r=isu_mammals http://ipt.vertnet.org:8080/ipt/archive.do?r=isu_mammals81e4afd9-0b61-483d-b7fa-0690f06c8e14 ISU Mammals 2e4967ed-fd35-4d34-ae4d-e8731d366e97 Illinois State University PreservedSpecimen 1Mammals North America United States McLean County 48.7288900000 -101.9727800000 1954-01-01 North America, United States, North Dakota, McLean County ISU Unknown Normal Skin only - 1 Female North Dakota 48.7288900° N 101.9727800° W 1954 Animalia Chordata Mammalia Rodentia Sciuridae Sciurus niger Sciurus niger
@robinkraft
robinkraft / gist:5268273
Last active December 15, 2015 13:38
Stacktrace for issue with Kryo/Cascading/Cascalog in forma-clj (as of March 28, 2013).

Discussion here. Fix seems to have been rolled out in Kryo 2.17, which is picked up as a dep in Cascalog 1.10.1. Possible fix in forma-clj project here

cascading.pipe.OperatorException: [1eceaf05-5138-48a1-af0...][cascalog.workflow$buffer$fn__4479.invoke(workflow.clj:249)] operator Every failed executing operation: ClojureBuffer[decl:'?pixel-idx', '?start', '?end', '?series']
  at cascading.flow.stream.BufferEveryWindow.receive(BufferEveryWindow.java:139)
	at cascading.flow.stream.BufferEveryWindow.receive(BufferEveryWindow.java:41)
	at cascading.flow.hadoop.stream.HadoopGroupGate.run(HadoopGroupGate.java:90)
	at cascading.flow.hadoop.FlowReducer.reduce(FlowReducer.java:129)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:527)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:428)
@robinkraft
robinkraft / gist:5200606
Created March 19, 2013 22:10
Replication error with aggressively partitioned pail using Cascalog and dfs-datastores
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/cascalog_reserved/5b03e647-75d0-42a8-b50a-30944a4a5caf/_temporary/_attempt_201303191542_0001_m_000144_2/adjusted/500-16/2009-07-12/part-00144216.pailfiletmp could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1531)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:685)
at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
@robinkraft
robinkraft / gist:5200079
Last active December 15, 2015 04:18
Cascading template tap example using Cascalog
;; dumps data into directories based on template, à la
;; /tmp/test-template/2012-01-01/a/part-00000
;; /tmp/test-template/2012-01-01/b/part-00000
;; /tmp/test-template/2013-12-31/a/part-00000
(let [src [["2012-01-01" "a" 2]
["2012-01-01" "b" 3]
["2013-12-31" "a" 4]]]
(?- (hfs-seqfile "/tmp/test-template"
:sink-template "%s/%s"