First Crack Release Notes, December 2019

By Zac J. Szewczyk on 2019/12/31 06:41:36 EST in Programming

I spent a lot of time on First Crack this month. Writing last month’s release notes pushed me to stop dragging my feet, and I made some great progress.

December Activity #

I talked about the problems multi-threading caused in last month’s post, and my band-aid fix: instantiating a Markdown parser for each file. This bumped First Crack’s sub-second runtime up over the one second mark, and just felt lazy, so I set out to fix it in early December.

I first binned the files by year, then fed each bin to its own core. First Crack rebuilt HTML files as needed, sorted them by time, then went back over them to extract their title and content for the blog and archives pages. I chose this approach because I had to build those index pages from newest to oldest, and it allowed the engine to do most of the file operations — and all the Markdown parsing — up front, as fast as possible, and without regards to order. This rewrite ended up running slower than the original, though, and had its fair share of strange bugs¹. I played around with it for the rest of the weekend, but the rewrite I had such hope for just never made the cut.

I did find a few ways to improve performance during the course of that rewrite, though, which ended up cutting runtime by up to half. Check out the GIF below, via Terminalizer, which shows the latest version of First Crack clearing and then building my entire site ten times:

First Crack rebuilds the entire site in as much as 0.81 seconds, and as little as 0.64 — its best yet. De-duplicating work across a few methods helped, but for the most part that massive speed boost came from slashing I/O operations. To build the Post Archives page and RSS feed, the engine used to sort every file, read a certain number of paragraphs based on article type, open the target, append that content, close both, and then repeat until it got through every post. It now does this in batches, which got rid of almost 1,000 file operations on archives.html and rss.xml each. This had the greatest impact on First Crack’s runtime in December, by far.

Although Jeff Huang’s recent push for developers to write well-structured HTML to help novies learn did give me pause, I also gave up on “pretty printing” in December. Minifying the template alone shrank its file size by almost 25%. Applying this to the rest of the build process shaved almost 1KB off the average file size. As a result, First Crack takes less time to build smaller files that then load faster for you — a win all around. All DOM inspection tools reformat HTML anyway, so I’ll take the performance boost with no real downside.

Feature Roadmap #

Going forward, I plan to focus on these (mostly minor) features.

Release Markdown Parser #

I still want to release my Markdown parser as its own project. I still have some bugs to work out, though, I want to go public with greater coverage of the spec, and I would like to add the ability to parse multi-line strings and entire files at once.

Publish Implementation of Markdown Spec #

I still want to outline the peculiarities of my implementation of the Markdown spec. This would cover weird edge cases for the most part, but documenting these shortfalls would still have value so that those who use my engine will have some sort of explanation for why their article looks weird, and so that I may one day fix them. I made some progress here this month, but not enough for a finished product.

Improve Documentation #

As always, I could do more here. Again, a few of the ways I think I can improve the README:

Performance graphs of First Crack’s back-end performance versus other, similar engines. At less than two seconds to build a website of over one thousand pages, I want to highlight this.
Performance graphs of the web pages First Crack builds versus the pages common content management systems build.
Screenshots. This site is a live demo of the engine, but I like the idea of having a few pictures in there, too.

As always, I look forward to the work ahead.

↩ How about this one: if the beta engine creates a new Markdown purser for each file, a seemingly random subset of 2019 posts do not get parsed. Instead, the original Markdown appears in the output HTML files as if copied straight from the source. If the engine creates a new parser for each month, more files go unparsed, this time from other years as well. If each core gets one parser to handle the entire year, the parser skips over almost 10% of the source files. Weird.