Lo: thy dread empire, Chaos, is restored;
Light dies before thy uncreating word:
Thy hand, great Anarch! lets the curtain fall;
And universal darkness buries all.
The Dunciad
— Alexander Pope

AsciiDoc and Makefiles

About two years ago, I uploaded the first version of my new homepage utilizing the document format AsciiDoc and a then rather simple Makefile of 25 lines. In those past two years I wrote 13 articles (including this one), released 6 poems, finished one small Inform 7 project, started working on another but largely scrapped it (who knows, maybe I’ll pick it up again), and put up 2 short stories and 3 small essays. The amount of content and my demand of functionality rose over time, and with that came an obvious increase in complexity.

At this preset moment, the Makefile I use has 80 lines. Add to that a small helper script with six lines that parses a custom table format. After abstracting away the complexities of AsciiDoc itself (which is — it has to be said — a beast), what I have here is a sturdy website generator and blogging platform in under 100 lines. The problem, indeed, is not the generator itself, but bending AsciiDoc to my will, that is to say, dealing with its many shortcomings when deployed as a “website backend”.

One of those shortcomings is the inherent staticity; the fact that all information must be present (and in-file) at “rendering” time. Serving dynamic content (by which I mean content that changes for each article published and is usually displayed on more than one page) is thus only possible by utilizing include directives. Such a directive looks like this:

src/txt/index.txt (excerpt)
include::../includes/tmp/recent.txt[]
src/includes/tmp/recent.txt
- (`2016-11-10`) -- link:untitled.html[Untitled]
- (`2016-11-06`) -- link:ky0_act4.html[No Shame to be Forgotten - Kentucky Route Zero, Act IV]
- (`2016-11-05`) -- link:night.html[At Night]
- (`2016-08-04`) -- link:cat.html[The Cat] (_English_), link:katze.html[Die Katze] (_Deutsch_)
- (`2016-08-01`) -- link:inside.html[Life in Muted Colours - A Look at Inside and LIMBO]

Back when I first implemented this, all the includes were already in AsciiDoc markup (as seen above). This was simple, fast, and it solved the problem. I no longer had doubled information all over the page and I could use the single include file as a dependency for all the pages that needed it in the Makefile. The first contrivance I added then was the parsing of the single include files to dynamically create a file for recent content. This meant merging the files, sorting by date (which was luckily included), and then getting the six most recent entries. The relevant entry in the Makefile looked like this:

$(INC)/tmp/recent.txt: $(INC)/content/*.txt
       sed -n '/^\/\// !p' $^ | sort -rn -k2.3 | head -n 6 >$@

This was acceptable at the time and seemed only like a small hack. Having the includes in AsciiDoc markup was beginning to seem more like a crutch, however. After all, a time might come in which I would need to parse those files in a more complex way…

RSS and file sorting

Somewhen™ in 2016, I added RSS functionality and a feed to this website. Having the content sorted by date in the .rss file is quite important, and I was lucky to have chosen a convenient naming pattern for the articles that would end up in the feed. Of course the single articles themselves contained the date, but parsing them (and thus putting even more code into the generation of the site) was something I thought unsustainable.

Articles used to match the following pattern: [0-9]{3}[a-z]_.+

The first three digits are a 0-padded counter; then — right before the final underscore — a single letter denotes the post’s category.

The .rss files themselves are also automatically created on demand with AsciiDoc and a docbook converter. Thus it was easy to get a chronologically sorted list of RSS blurbs for processing the main feed:

$(SITE)/rss.xml: $(XRSS) $(RSS)/header.rss $(RSS)/footer.rss
    find $(RSS_TEMP) -iname "*.rss" -print0 | sort -zr | xargs -0 cat > $(RSS)/body.rss

This worked fine, but now there was already an apparent doubling of data (namely dates in the includes themselves as markup, and an “emergent” date sorting by file number). Plus, the numbering does not really look so appealing when ending up on the website as a URL. So I decided — after someone pointed out the former point — to take the numbers out.

As this would only affect the RSS generation, I thought it an easy task. “After all, I could just parse the include files, right?”, I mused. At this point, however, parsing them became a nightmare. I ended up with the following eldritch monstrosity:

$(SITE)/rss.xml: $(XRSS) $(RSS)/header.rss $(RSS)/footer.rss
    sed -n '/^\/\// !p' src/includes/content/{articles,prose,poetry}.txt | sort -rn -k2.3 | \
    awk -F: '{print $$2}' | egrep -o '.*\.html' | sed "s/html/rss/g; s|^|$(RSS_TEMP)/|g" | \
    tr '\n' '\0' | xargs -0 cat > $(RSS)/body.rss

Had it not been for feedback from someone who was visibly appalled by this hellish nightmare, I would have kept it. Looking back, I’m glad I didn’t. Turns out that there is indeed a much better and cleaner way of doing this.

Tables and shell scripts

Clearly the problem was that the include files were already in AsciiDoc markup and therefore quite unparsable. After some deliberation and feedback, I decided that the cleanest way to generate dynamic includes and RSS file lists was to have a custom “table” format with which to describe the dynamic content. It is a very simple CSV-like format with | as the delimiter:

src/tables/prose
2016-08-04|e_cat|[The Cat] (_English_), link:katze.html[Die Katze] (_Deutsch_)
2016-07-13|e_schreiben|[Über das Schreiben] (_Deutsch_), link:writing.html[On Writing] (_English_)
2016-07-07|e_notebooks|[On Notebooks]
2016-06-07|s_farmer|[Short Fiction: A Farmer]
2016-06-01|s_modes|[Short Fiction: Modes of Transportation]

For each category of content there exists a separate file: src/tables/articles src/tables/inform [..]. Parsing these is very easy with a small shell script that reads directly from stdin:

gen_includes.sh
#!/bin/mksh

while read line; do
    IFS="|" read date file asciidoc <<< "$line"
    printf "- (\`%s\`) -- link:%s.html%s\n" "$date" "$file" "$asciidoc"
done

The generation of the most recently published content is equally easy. Together with the rules for each table file, the relevant part of the Makefile now looks like this:

$(INC)/tmp/content/%.txt: $(TAB)/%
        ./gen_include.sh < $^ > $@

$(INC)/tmp/recent.txt: $(TAB)/*
        sort -rn $^ | head -n 6 | ./gen_include.sh >$@

Getting a sorted list of RSS files has also become a very simple job and now does not anymore require the user to summon demonic beings:

$(SITE)/rss.xml: $(XRSS) $(RSS)/header.rss $(RSS)/footer.rss $(TAB)/*
        sort -rn src/tables/* | awk -F\| '{print "rss/tmp/" $$2 ".rss"}' | \
        tr '\n' '\0' | xargs -0 cat > $(RSS)/body.rss

Remaining annoyances and imperfections

Thus the journey to a nicer dynamic content creation and neater file names is complete. Sadly I cannot yet eliminate the “tag” with which I mark a site’s category from the file name (like the a in complexity.txt for this particular article), as I use it to choose the corresponding site CSS and HTML class for the banner. This is all done through the rather horrible AsciiDoc config file (see below), and shall probably remain horrible until a benevolent faerie bestows upon me the power and motivation to overcome this limitation myself.

conf/xhtml11.conf (excerpt)
{infile@.*[pse]_.*.txt:<link rel="stylesheet" href="{stylesdir}/dless.css" type="text/css" />:}

Initially I wanted to hide the tag by using lighttpd’s rewrite function, but as all the links in the AsciiDoc text files are relative, this would mean I’d have to make them all absolute — something which I am not ready to do.

In any case, enjoy. There’s lots more to come.