First Crack Release Notes, November 2019

By Zac J. Szewczyk on 2019/12/04 08:19:51 EST in Programming

A few days late, again, but here it is: First Crack’s release notes for November, 2019. Again in October, like in September, I spent most of my dev time on an Instapaper-like read later service. I use it every day, and plan to release it. I did get a couple things done, though; once again, I did not neglect First Crack entirely.

November Activity #

CSS grid replaced old-fashioned tables to make the navigation menu, archives page, and year breakdowns more responsive. I had been using media queries to make the site responsive before, but CSS grid does it better, more consistently, and with less code. About time I caught up to the rest of the industry.

I also spent some more time on — shocker — performance. Some legacy code cruft was adding a minuscule amount of unnecessary overhead to First Crack. For an engine that generates over a thousand pages in less than a second, though, that overhead meant the difference between execution time looking like 0.85 < runtime < 1.2 and 0.83 < runtime < 0.90. I then gave up my hard-won performance gains to avoid a race condition.

During initialization, First Crack used to create one Markdown parser for all the files. This worked fine, because the parser’s buffers got cleared for each new file. I started having problems in August. With the engine now running on multiple cores but still using that one parser, each core kept overwriting the buffers with data from a different file. Although a clear flaw, the vast majority of my articles consist of paragraphs, so this had almost no impact: if core #1 processed three paragraphs and then core #7 jumped in with a paragraph before core #1 read its fourth paragraph, both cores still built an HTML file with four and one paragraphs, respectively. This became a problem, though, when core #1 processed three lines of a table, and then core #7 jumped in with a paragraph. The parser then made two wrong decisions: first, because the previous lines were table rows and it now had a plain paragraph, core #7 would receive a close table tag plus the new line formatted as a paragraph; then, when core #1 sent the fourth row of its table, the parser saw this as a new table and sent back an open table tag plus the new line formatted as a table row. At the end of this convoluted process, core #7 had a file with an unnecessary </table>, and core #1 had a file with at least two nested tables.

But wait, it gets better.

If core #1 managed to get through all four lines of that table before core #7 sent its paragraph, everything worked — but if core #3 entered an unordered list partway through, and core #8 entered a blockquote, while cores #2, #4, #5, and #6 kept sending the parser paragraphs... Yikes. Across eight cores and over a thousand files, First Crack got different lines wrong in different ways, every time I ran it — but it still got most of them right.

I solved this problem by creating an instance of the Markdown parser for each file. This overkill band-aid put execution time back in the 0.85 < runtime < 1.2 neighborhood. I hope to win back some of that performance by creating one instance of the Markdown parser per core, but I have not gotten around to that yet.

Last, First Crack now resizes the command line interface if it detects a window less than 59 characters wide. I chose this number because the longest menu item comes in at 59 characters long, and took the time to do this because I dislike wrapped text. For the most part, it looks like garbage in a terminal — but not so with First Crack anymore. For small windows, First Crack gracefully resizes the menu to 45 characters. Any smaller than that, and the text will wrap as usual.

Feature Roadmap #

Going forward, I plan to focus on these (mostly minor) features.

Reduce Multiprocessing Overhead #

As I said above, First Crack now creates a new instance of the Markdown parser for each file. This avoids the race condition, but since the engine farms out posts by year to the individual cores, it only needs to create one instance of the parser per core. Then, as each core finishes processing a file and opens a new one, First Crack can clear that core’s parser’s buffers before moving on to the next file. This will minimize unnecessary overhead while still generating correct output files.

Re-Implement "Pretty Print" #

I would still like to re-implement the “pretty print” feature now that First Crack uses a stateful Markdown parser. Low priority, but something I want to get done nonetheless.

Release Markdown Parser #

I still want to release my Markdown parser as its own project. I still have some bugs to work out, though, I want to go public with greater coverage of the spec, and I would like to add the ability to parse multi-line strings and entire files at once.

Publish Implementation of Markdown Spec #

I still want to outline the peculiarities of my implementation of the Markdown spec. This would cover weird edge cases for the most part, but documenting these shortfalls would still have value so that those who use my engine will have some sort of explanation for why their article looks weird, and so that I may one day fix them. I made some progress here this month, but not enough for a finished product.

Improve Documentation #

I did some work on the documentation this month, but as always, I could do more. Again, a few of the ways I think I can improve the README in particular:

Performance graphs of First Crack’s back-end performance versus other, similar engines. At less than two seconds to build a website of over one thousand pages, I want to highlight this.
Performance graphs of the web pages First Crack builds versus the pages common content management systems build.
Screenshots. This site is a live demo of the engine, but I like the idea of having a few pictures in there, too.

As always, I look forward to the work ahead.