Install pandoc on Mac OS X 10.8
$ brew install haskell-platform$ brew install haskell-platform| server { | |
| listen 80 default; ## listen for ipv4; this line is default and implied | |
| listen [::]:80 default ipv6only=on; ## listen for ipv6 | |
| # Make site accessible from http://localhost/ | |
| server_name localhost; | |
| server_name_in_redirect off; | |
| charset utf-8; |
| user www-data; | |
| # As a thumb rule: One per CPU. If you are serving a large amount | |
| # of static files, which requires blocking disk reads, you may want | |
| # to increase this from the number of cpu_cores available on your | |
| # system. | |
| # | |
| # The maximum number of connections for Nginx is calculated by: | |
| # max_clients = worker_processes * worker_connections | |
| worker_processes 1; |
| This gist covers a simple Hive eval UDF in Java, that mimics NVL2 functionality in Oracle. | |
| NVL2 is used to handle nulls and conditionally substitute values. | |
| Included: | |
| 1. Input data | |
| 2. Expected results | |
| 3. UDF code in java | |
| 4. Hive query to demo the UDF | |
| 5. Output |
| This gist covers a simple Pig eval UDF in Java, that mimics NVL2 functionality in Oracle. | |
| Included: | |
| 1. Input data | |
| 2. UDF code in java | |
| 3. Pig script to demo the UDF | |
| 4. Expected result | |
| 5. Command to execute script | |
| 6. Output |
| This gist covers the Oozie SSH action. | |
| It includes components of a sample Oozie workflow application- scripts/code, | |
| sample data and commands; Oozie actions covered: secure shell action, email | |
| action. | |
| My blog has documentation, and highlights of a very basic sample program. | |
| http://hadooped.blogspot.com/2013/10/apache-oozie-part-13-oozie-ssh-action_30.html | |
| This gist includes: |
| My blog has an introduction to reduce side join in Java map reduce- | |
| http://hadooped.blogspot.com/2013/09/reduce-side-join-options-in-java-map.html | |
| ********************** | |
| **Gist | |
| ********************** | |
| This gist details how to inner join two large datasets on the map-side, leveraging the join capability | |
| in mapreduce. Such a join makes sense if both input datasets are too large to qualify for distribution | |
| through distributedcache, and can be implemented if both input datasets can be joined by the join key | |
| and both input datasets are sorted in the same order, by the join key. | |
| There are two critical pieces to engaging the join behavior: |
| ************************* | |
| Gist | |
| ************************* | |
| One more gist related to controlling the number of mappers in a mapreduce task. | |
| Background on Inputsplits | |
| -------------------------- | |
| An inputsplit is a chunk of the input data allocated to a map task for processing. FileInputFormat | |
| generates inputsplits (and divides the same into records) - one inputsplit for each file, unless the |
| ********************** | |
| Gist | |
| ********************** | |
| A common interview question for a Hadoop developer position is whether we can control the number of | |
| mappers for a job. We can - there are a few ways of controlling the number of mappers, as needed. | |
| Using NLineInputFormat is one way. | |
| About NLineInputFormat | |
| ---------------------- |