Skip to content

Instantly share code, notes, and snippets.

View aladagemre's full-sized avatar

Emre Aladağ aladagemre

View GitHub Profile
@aladagemre
aladagemre / HBaseEdgeInputFormat.java
Created August 27, 2013 13:48
Draft: HBaseEdgeInputFormat for Giraph
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
public Map<String,Object> run(Map<String,Object> args) throws Exception {
String crawlId = (String)args.get(Nutch.ARG_CRAWL);
numJobs = 1;
currentJobNum = 0;
currentJob = new NutchJob(getConf(), "update-table");
if (crawlId != null) {
currentJob.getConfiguration().set(Nutch.CRAWL_ID_KEY, crawlId);
}
//job.setBoolean(ALL, updateAll);
ScoringFilters scoringFilters = new ScoringFilters(getConf());
@aladagemre
aladagemre / mptt-load
Created March 2, 2013 20:43
Easy way of MPTT Fixtures: parsing tree and creating objects manually
"""
Parses a given file as a category tree and saves each item as MPTT Objects.
Example format of the input file:
- Books
-- Novel
-- Tech
--- Science-Fiction
--- Documentary
-- Fun
@aladagemre
aladagemre / image_pull_latex.py
Created June 27, 2011 18:25
Image downloader for Latex
"""Asks for image URL, downloads it to images folder.
Generates bibtex record and appends it to the end of the bib file.
Prints the figure LaTeX code."""
import urllib
import shutil
import os
def download_file(url, binary=True):
filename = url.split('/')[-1].split("?")[0]