Skip to content

Instantly share code, notes, and snippets.

View pedrocicoleme's full-sized avatar

Pedro Leme pedrocicoleme

  • Sao Paulo - BR
View GitHub Profile
@sids
sids / html_parser.clj
Created May 6, 2010 05:44
HTML Parsing in Clojure using HtmlCleaner.
(ns in.grok.history.html-parser
(:require [clojure.contrib.logging :as log])
(:import [org.htmlcleaner HtmlCleaner]
[org.apache.commons.lang StringEscapeUtils]))
(defn parse-page
"Given the HTML source of a web page, parses it and returns the :title
and the tag-stripped :content of the page. Does not do any encoding
detection, it is expected that this has already been done."
[page-src]
@psd
psd / README
Created January 4, 2010 14:47
OpenOffice Document Conversion
Experiments in running a headless OpenOffice as a document convertor for TiddlyDocs, etc.
Running OpenOffice Headless:
$ cd /Applications/OpenOffice.org.app/Contents/program #Mac OSX
$ cd /usr/lib/openoffice.org.app/program #CentOS
$ ./soffice.bin -headless -invisible -nofirststartwizard -accept="socket,port=8100;urp;"
init script:
see openoffice.sh xvfb.sh for init.d scripts