Skip to content

Instantly share code, notes, and snippets.

View ncouture's full-sized avatar
🌳
Continuously Growing

Nicolas Couture ncouture

🌳
Continuously Growing
View GitHub Profile
@ncouture
ncouture / setup.py
Created May 5, 2016 20:10 — forked from mmerickel/setup.py
SSLOnlyMiddleware
setup(
entry_points={
'paste.filter_app_factory': [
'ssl_only = myapp.middlewares.ssl_only:make_filter',
],
}
)
@ncouture
ncouture / tidy.conf
Last active July 29, 2019 20:05 — forked from paultreny/tidy.conf
The config file I use for tidy-html5. $ tidy -config <path/to/tidy.conf> input.html > output.html
# Example tidy-html5 configuration file.
fix-uri: yes
keep-time: yes
hide-comment: yes
indent-cdata: yes
omit-optional-tags: yes
output-encoding: utf8
indent: auto
indent-spaces: 2
@ncouture
ncouture / setup.md
Created December 16, 2015 18:47 — forked from xrstf/setup.md
Nutch 2.3 + ElasticSearch 1.4 + HBase 0.94 Setup

Info

This guide sets up a non-clustered Nutch crawler, which stores its data via HBase. We will not learn how to setup Hadoop et al., but just the bare minimum to crawl and index websites on a single machine.

Terms

  • Nutch - the crawler (fetches and parses websites)
  • HBase - filesystem storage for Nutch (Hadoop component, basically)