Skip to content

Instantly share code, notes, and snippets.

@sankars
Last active August 29, 2015 14:21
Show Gist options
  • Save sankars/3b47173b121709591a42 to your computer and use it in GitHub Desktop.
Save sankars/3b47173b121709591a42 to your computer and use it in GitHub Desktop.
Integration Steps
Install Hadoop on all machines
Dowload Nutch source and extract it
Modify Nutch-site.xml to set nutch http agent & robots agent config
Copy all hadoop xml configs & hadoop-env.sh to Nutch's conf directory
Build nutch using ant
Add the nutch*.job file to Hadoop classpath
Distribute the built nutch artifacts to all machines and install
Create seed file and place it in hdfs
Start MR Nutch jobs using nutch command
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment