Skip to content

Instantly share code, notes, and snippets.

@enachb
Created September 3, 2009 00:04
Show Gist options
  • Select an option

  • Save enachb/180049 to your computer and use it in GitHub Desktop.

Select an option

Save enachb/180049 to your computer and use it in GitHub Desktop.
fixed error returning HDFS instead of local path
// sticking it in as part of the hadoop job
FileSystem fs = FileSystem.get(jobConf);
fs.mkdirs(new Path("/discovery/UrlDb"));
fs.copyFromLocalFile(new Path(args[3] + "/gsb.blacklist"), new Path("/discovery/UrlDb"));
fs.copyFromLocalFile(new Path(args[3] + "/gsb.malwarelist"), new Path("/discovery/UrlDb"));
DistributedCache.addCacheFile(new URI("/discovery/UrlDb/gsb.blacklist"), jobConf);
DistributedCache.addCacheFile(new URI("/discovery/UrlDb/gsb.malwarelist"), jobConf);
// to use it in a cascading function
public void prepare(FlowProcess flowProcess, OperationCall operationCall) {
HadoopFlowProcess hfp = (HadoopFlowProcess) flowProcess;
for (Path fn : DistributedCache.getLocalCacheFiles(hfp.getJobConf())) {
LOG.info("Loading cached file: " + fn);
File file = new File(fn.toString());
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment