Last active
August 29, 2015 14:25
Cascalog Workflow example
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(defn -main | |
[arg] | |
(workflow ["/tmp/workflow"] | |
read-data ([:tmp-dirs [data-path]] | |
(import-data path1 path2)) | |
work-step ([:deps :all] | |
(let [data (hfs-seqfile data-path)] | |
(?- (hfs-textline output-path-1 :sinkmode :replace) (query1 data) | |
(hfs-textline output-path-2 :sinkmode :replace) (query2 data)))))) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In cascalog this is how you describe a workflow and to make a temp directory or a specific checkpoint you do something along the lines of
[:tmp-dirs [data-path]]
Under the hood I'm assuming this uses some hadoop api to create the file locally and handle it. This works seamlessly locally and in production. I'm wondering what the equivalent would be for doing the same in cascading.Then within the
FlowDef
of the job add in.addCheckpoint(checkpoint, checkpointTap)