Static Site Generation as Discrete Stages

So imagine you have the following file structure:

.
├── _plugins
├── _posts
│   ├── 2013-01-01-test-post.md
│   └── 2013-10-02-blah.md
├── _templates
│   ├── base.html
│   └── post.html
└── config.yml

3 directories, 5 files

The idea (of static site generators) is that you can run: foo-ssg build and get the following:

.
├── index.html
└── posts
    └── 2013
        ├── index.html
        ├── 01
        |   ├── index.html
        |   └── 01
        |       ├── index.html
        |       └── test-post.html
        └── 10
            ├── index.html
            └── 02
                ├── index.html
                └── blah.html

Or something similar, at least. (in this case, I'm imagining index.html under the date directories to be indexes per day/week/month/whatever.

So to accomplish this, we could use Jekyll/Hyde/Pelican/pandoc/whatever. BUT THAT'S NO FUN. Those systems are super complicated, especially to extend. (Jekyll is the best, in my experience, and it still fails in counterintuitive ways)

Stages

So since I'm apparently obsessed with stream processing and functional programming, we need three kinds of nodes/functions/stages to transform the former file structure into the latter: converters, renderers, and filters. The graph ends up looking like:

digraph ssg {
    # push filenames (optional w/ content to converters)
    input -> converter;

    # converters can also listen to other converters
    converter -> converter;

    # all converter output gets filtered
    converter -> filter;

    # the last stage is the renderer - this would probably just be putting
    # rendered templates into HTML.
    filter -> renderer;
}

Converters

Take n file's worth of input, produce m (greater than or equal to n) objects worth of structured data. Can depend on other converters (for example, needing all the posts available to produce archive pages.)

For example, the following markdown file:

---
title: Blah
date: 2013-10-02
---
# Test!

First paragraph

Second paragraph

could produce this JSON object:

{
    "title": "Blah",
    "date": [2013, 10, 2], // punting on the date format here, probably would be unix timestamp
    "summary": "<p>First paragraph</p>",
    "content": "<p>First paragraph</p><p>Second paragraph</p>",
    "type": "post",
    "source": ["_posts", "2013-10-02-blah.md"]
}

They could do any task which requires conversion of one form or another (PNG crushing and SCSS/LESS conversion come to mind)

Filters

Filters take n items of structured data and produce m (less than or equal to n) unchanged items of structured data.

So given:

{
    // snip
    "draft": true
}

A draft filter would not produce anything, except maybe a log message.

Renderers

Finally, a renderer takes n items of structured data and produces n HTML (or whatever) pages, plus a path to each. Ideally, each converter has a renderer. So let's take the example of an ATOM feed creator, in pseudo-python code:

converters/atom.py:

return sorted(key=lambda p: p['date'], blogposts)[:10]

converters/atom.py:

return jinja2.render('atom.xml', posts=atom_posts), 'atom.xml'

I/O

My thought is that since none of these things are necessarily python-specific, it would be easiest to implement them in a diverse body of languages. If BoingBoing or someone wants to write a go program because their enormous blog is taking too long to render to markdown, they should be able to. So we need a two-stage build process for each "node":

Discovery
1. The controller system finds built-in and site-specific nodes in specified folders, and runs each with a JSON input describing the site's configuration.
2. The node responds with a JSON output describing which streams it wants to subscribe to.
Rendering
1. The controller sends each known converter node all the content that matches subscriptions.
2. The converter nodes respond with structured data.
3. The controller passes the structured data to the filters (again matching subscriptions) paying attention to their filter-or-not signals.
4. The controller sends all content to specified renderers
5. The renderers return final rendered versions of each file, plus filenames
6. The controller creates necessary files and folders, filled with content

... actually, on second thought this is gonna be a ton easier to just write Python classes for. We'll start there.

BrianHicks/ssg.md

Static Site Generation as Discrete Stages

Stages

Converters

Filters

Renderers

I/O