Skip to content

Instantly share code, notes, and snippets.

@neilkod
Created June 14, 2011 14:46
Show Gist options
  • Save neilkod/1025039 to your computer and use it in GitHub Desktop.
Save neilkod/1025039 to your computer and use it in GitHub Desktop.
I want pig to return "2011-06-14" instead of 2011 minus 6 minus 14
Problem: I'm trying to set a pig variable to the current date in YYYY-MM-DD format but pig interprets the YYYY-MM-DD as an expression and then solves it.
How can I coerce pig into accepting YYYY-MM-DD as a chararray? The cast operator isn't helping here.
watch:
-bash-3.1$ date +%Y\-%m\-%d
2011-06-14
# 2011 minus 6 minus 14
-bash-3.1$ python -c 'print 2011-6-14'
1991
-bash-3.1$ cat test.pig
raw = LOAD 'hello.txt' as (txt:chararray);
%declare thedate `date +%Y\-%m\-%d`;
tst = foreach raw generate txt, $thedate;
dump tst;
-bash-3.1$ cat hello.txt
hello
world
test.pig produces:
(hello,1991)
(world,1991)
full pig output:
-bash-3.1$ pig -x local test.pig
2011-06-14 07:41:18,718 [main] INFO org.apache.pig.Main - Logging error messages to: /home/nkodner/ wip/pig_1308062478717.log
2011-06-14 07:41:18,759 [main] INFO org.apache.pig.tools.parameters.PreprocessorContext - Executing command : date +%Y\-%m\-%d
2011-06-14 07:41:18,899 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2011-06-14 07:41:19,360 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used i n the script: UNKNOWN
2011-06-14 07:41:19,360 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-06-14 07:41:19,702 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: tst: Store(file:/tmp/temp-2067247496/tmp195828621:org.apache.pig.impl.io.InterStorage) - sc ope-10 Operator Key: scope-10)
2011-06-14 07:41:19,725 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MR Compiler - File concatenation threshold: 100 optimistic? false
2011-06-14 07:41:19,780 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Mu ltiQueryOptimizer - MR plan size before optimization: 1
2011-06-14 07:41:19,780 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Mu ltiQueryOptimizer - MR plan size after optimization: 1
2011-06-14 07:41:19,803 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Met rics with processName=JobTracker, sessionId=
2011-06-14 07:41:19,823 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2011-06-14 07:41:19,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Jo bControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-06-14 07:41:22,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Jo bControlCompiler - Setting up single store job
2011-06-14 07:41:22,284 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JV M Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-14 07:41:22,285 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Ma pReduceLauncher - 1 map-reduce job(s) waiting for submission.
2011-06-14 07:41:22,297 [Thread-3] INFO org.apache.hadoop.util.NativeCodeLoader - Loaded the native -hadoop library
2011-06-14 07:41:22,507 [Thread-3] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Tot al input paths to process : 1
2011-06-14 07:41:22,507 [Thread-3] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUt il - Total input paths to process : 1
2011-06-14 07:41:22,520 [Thread-3] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUt il - Total input paths (combined) to process : 1
2011-06-14 07:41:22,794 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Ma pReduceLauncher - 0% complete
2011-06-14 07:41:22,987 [Thread-3] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-nko dner/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.child .java.opts; Ignoring.
2011-06-14 07:41:23,006 [Thread-3] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-nko dner/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.jobtr acker.maxtasks.per.job; Ignoring.
2011-06-14 07:41:23,007 [Thread-3] WARN org.apache.hadoop.conf.Configuration - file:/tmp/hadoop-nko dner/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.r euse.jvm.num.tasks; Ignoring.
2011-06-14 07:41:23,148 [Thread-4] INFO org.apache.hadoop.mapred.Task - Task:attempt_local_0001_m_0 00000_0 is done. And is in the process of commiting
2011-06-14 07:41:23,155 [Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner -
2011-06-14 07:41:23,155 [Thread-4] INFO org.apache.hadoop.mapred.Task - Task attempt_local_0001_m_0 00000_0 is allowed to commit now
2011-06-14 07:41:23,168 [Thread-4] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp-2067247496/tmp195828621
2011-06-14 07:41:23,168 [Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner -
2011-06-14 07:41:23,168 [Thread-4] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local_0001_m_ 000000_0' done.
2011-06-14 07:41:23,516 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Ma pReduceLauncher - HadoopJobId: job_local_0001
2011-06-14 07:41:28,038 [main] WARN org.apache.pig.tools.pigstats.PigStatsUtil - Failed to get RunningJob for job job_local_0001
2011-06-14 07:41:28,042 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2011-06-14 07:41:28,042 [main] INFO org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats reported below may be incomplete
2011-06-14 07:41:28,046 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.2-cdh3u0 0.8.0-cdh3u0 nkodner 2011-06-14 07:41:19 2011-06-14 07:41:28 UNKNOWN
Success!
Job Stats (time in seconds):
JobId Alias Feature Outputs
job_local_0001 raw,tst MAP_ONLY file:/tmp/temp-2067247496/tmp195828621,
Input(s):
Successfully read records from: "file:///home/nkodner/wip/hello.txt"
Output(s):
Successfully stored records in: "file:/tmp/temp-2067247496/tmp195828621"
Job DAG:
job_local_0001
2011-06-14 07:41:28,048 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2011-06-14 07:41:28,056 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2011-06-14 07:41:28,056 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(hello,1991)
(world,1991)
-bash-3.1$
@ktahern
Copy link

ktahern commented May 3, 2012

Surround the date variable with single quotes so that it takes the whole thing as a string instead of trying to perform the "-" operation.

tst = foreach raw generate txt, '$thedate';

@neilkod
Copy link
Author

neilkod commented May 3, 2012

heh that's an old paste....i figured this out long ago but never bothered to update the gist...but thanks!!!

@ktahern
Copy link

ktahern commented May 3, 2012

I didn't see the date on the post. I'm new to Pig, and was looking at how to get the current date inside a pig script, so this post helped me with that. Thanks!!

@neilkod
Copy link
Author

neilkod commented May 3, 2012

that was a funny issue. It makes sense but I didn't expect pig to evaluate the date as an expression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment