Wrapping up the blog project

This site is now migrated off the old hosting provider, so everything is cool! I've decided to put a pin in writing about this blog project, since I've lost some momentum (and enthusiasm) in writing about it. If you're curious about how it all turned out, the source code is on Github. Most of the interesting stuff is in the core app file.

It's funny: I partly decided to write my own blog app because I didn't want to spend any time trying to import my old blog into Jekyll or Pelican, but to be honest, I probably could've figured that out in the time it took me to write Smarf.

It's been a fun project that I learned a lot from, but I guess I don't have the patience for tech blogging anymore. 🤷


Reading and rendering a post

So now I have a pretty straightforward method of creating blog post files. As you may have guessed, I'm relying on the filename structure to provide the post ordering. Why? Because I'm lazy. Although I'm recording the post creation date inside each file, I'd rather skip having to load all posts (at least the metadata) into memory and read the date field in order to sort them. I could generate some kind of index, but nah. And that's easy to add later. Databases are tricked out for this sort of thing, but I'm not using a database. Filenames it is!

So now I need a way to read the post files and use a template to render HTML output. The idea here is to:

  1. given a filepath, find and open that file
  2. read the metadata and contents of the file
  3. parse any Markdown in the contents
  4. return a Post object representing the post

Luckily there's a half-kabillion NPM libraries for parsing text. I chose the standard frontmatter library to parse the post into a JSON object, and markdown-it to convert Markdown content into HTML:

const frontmatter = require('frontmatter');
const markdown = require('markdown-it')({ html: true });
const fs = require('fs');
import { Post } from './types';
import { checkType } from './checkType';

export function getPostData(filename: string): Post {
    const postContents: string = fs.readFileSync(`./posts/${filename}`, { encoding: 'utf8' });
    const parsedContents: any = frontmatter(postContents);
    const post: Post = JSON.parse(JSON.stringify(parsedContents.data));
    
    checkType(post, 'Post');
    post.content = markdown.render(parsedContents.content);
    return post;
}

The frontmatter library returns the parsed file contents as a single JS object with two properties: data for the metadata header section, and content for the body content. I'm using the JSON.parse(JSON.stringify(...)) trick to create a deep copy of the post metadata JSON and assign it to a Post type constant.

A simple template

From here it should be pretty simple to pass the post object to a function that uses it to return some HTML. I'd like all my primary template functions to adhere to a particular interface, so I added the following type definition to src/types:

export type Template = (posts: Post | Post[], blog?: Blog) => string;

Which is to say: functions of type Template must support 1) a single blog Post OR an array of Posts as an argument, and 2) a optional Blog argument (our parsed blog config), and must return a string.

Sweet, let's prototype a simple template function that generates a bare-bones HTML page from a post:

import { Post, Template } from '../src/types';
import { sanitize } from '../src/';
const df = require('user-friendly-date-formatter');

function getDisplayDate(dateTime: string) {
    return df(new Date(dateTime), '%D %fM %YYYY');
}

export const htmlPostTemplate: Template = (post: Post): string => {
    const title = sanitize(post.title);
    return `<article>
        <h1><a href="${ post.guid }">${ title }</a></h1>
        <time datetime="${ post.date }">${ getDisplayDate(post.date) }</time>
            ${ post.content }
        </article>
        `;
}

It's almost like JavaScript template literals were made for this kind of thing!

Okay, next up we glue this all together.


20 years since Y2K

Most of what I remember was my first taste of internet-fueled anxiety. The newly-commericialized web, teeming with articles from preppers, conspiracy theorists, religious doomsayers, and actual reporters. I was working at Cleveland Live coming in at 5AM to make sure the day's news files from The Cleveland Plain Dealer had arrived via FTP and stitching them together into HTML. As the fancy new local news website -- our commercial radio spots pronounced the 'www-dot' out loud -- we had several televisions in the office tuned to national news. So my work life was pretty much all news all day.

It was a good year for movies. It was likely the first year I heard the word "blog" used in media, probably on the local NPR station. My ongoing fascination with DHTML -- which today we'd call "DOM manipulation via JavaScript and CSS" -- was beginning to morph into a kind of profession.

I don't recall being too wound up over Y2K, but I was eventually convinced that keeping a few plastic gallon milk jugs filled with water, storing them in the kitchen closet, just in case, was a wise thing to do in any situation.

Anyway, I was oncall the night of Y2K. I probably went to a party with friends. I woke up on January 1st 2000 and logged into our system from my 1998 Gateway PC. I saw that nothing of note had happened, save the beginning of a new century and millenium. I compiled a list of links to related news stories and posted them along with a stock photo of the Earth, adding a section headline paraphrased from the PD: "Y2K Bug Nowhere To Be Found As Millenium Sweeps Across The Globe."

I posted the news. It was sunny out, so I went somewhere.


Is it safe?

Okay so the last post was a long one! These next few are gonna have to be shorter if I want to get this done before my host pulls the plug.

I don't plan on doing a lot of input sanitization, but decided to add a function that scrubs dangerous characters out of a string. This will keep me from wrecking my website when I forget to close an HTML tag somewhere or jam some bad characters into a tag attribute.

It seemed like it was a straighforward task just to implement something that observed the OWASP Rule #1 for preventing cross-site scripting. I started by defining a regular expression to match all the characters I want to escape:

const DISALLOWED = /[&<>"'\/]/gi;

This is the minimum required. However, OWASP Rule #2 recommends escaping all non-alphanumeric characters with ASCII values under 256 to ensure that a string can't break a tag attribute. I could just escape everything, except I'd like to have things like emojis still work.

So: we take the string and examine each character in order. If that character is not alphanumeric and has a code of 255 or less, we replace it with its numeric entity. We assemble a new string by pushing these values (both escaped and not) into an array, which we join at the end.

const ALPHANUM = /[a-z0-9]/i;

export function sanitize(content: string): string {
    const buffer = [];
    for (let i = 0; i < content.length; i++) {
        if (content[i].search(ALPHANUM) < 0 &&
            content[i].charCodeAt(0) < 256) {
            buffer.push(`&#${content[i].charCodeAt(0)};`);
        } else {
            buffer.push(content[i]);
        }
    }
    return buffer.join('');
}

Fun fact! The efficiency of creating a string with Array::join() versus string concatenation varies by implementation. Since sanitize is meant to run by Node and not in a browser, it would maybe be more efficient to use String::concat, but most of what I'll be sanitizing will be fairly short strings and I don't think the optimizations will have much impact. I can always change it later. Ugh.


A script to create a post

Let's start by outlining what create-post will do:

Step 1: import a bunch of libraries

I already know I'm going to use the type definitions and the checkType function. I'm going to need a template to take a post and render a Markdown file with a Frontmatter header, so I'll add a function for that in the /templates/ directory. JS template literals make this easy:

// /templates/markdown.ts

import { Post } from '../src/types';

export function markdownPostTemplate(post: Post): string {

return `---
title: "${post.title}"
date: "${post.date}"
filename: ${post.filename}
status: ${post.status}
author_uid: ${post.author_uid}
slug: ${post.slug}
guid: ${post.guid}
thumbnail_image: ${post.thumbnail_image || '' }
opengraph_image: ${post.opengraph_image || '' }
tags: ${post.tags || '' }
excerpt: "${post.excerpt || '' }"

---

${post.content || 'Write a blog post here'}
`;
}

I'm going to want to handle show-stopping fatal errors, so I'll write a small function that displays a message and exits the Node process. (This is mostly because I'm lazy and would rather not write these two lines over and over.)

// /src/die.ts
export function die(message: string) {
    console.error(`ERROR: ${message}`);
    process.exit();
}

Now I can import these two functions along with the Post and Blog types, the checkType utility, and the contents of the blog-config.json file.

import { checkType } from './src/checkType';
import { Post, Blog } from './src/types';
import { die } from './src';
import { markdownPostTemplate } from './templates/markdown';

const blog: Blog = require('../blog-config.json');

Next up I'll pull in some NPM libraries: minimist for processing command-line arguments, and user-friendly-date-formatter for managing the hell of date handling. I'll also import Node's fs (file system) library for writing the Markdown files to disk.

Finally I'll use checkType to make sure the data loaded from blog-config.json conforms to the Blog type.

const minimist = require('minimist');
const df = require('user-friendly-date-formatter');
const fs = require('fs');

checkType(blog, 'Blog');
console.log('Blog config loaded and validated.');

Step 2: get title and date parameters

Minimist might be a little heavyweight for what I need, but it sure makes getting arguments easy. I'm including short, one-letter versions of each argument and just grabbing the first one supplied:

let args = minimist(process.argv.slice(2), {
    string: [
        'title', 't', 'date', 'd'
    ],
});

const title = args.title || args.t;
const dateArg = args.date || args.d;

if (!title || title.length === 0) {
    die('No title parameter supplied.');
}

The date parameter is optional. If supplied, the app will try to create a post for the given date. Otherwise, it'll use the current date. The Node Date object can parse date strings, so this works so long as you don't give it something invalid.

let date;
if (dateArg && dateArg.length > 0) {
    date = new Date(dateArg);
    if (isNaN(date.valueOf())) {
        die(`Can't use date: ${dateArg}`);
    }
} else {
    date = new Date();
}

That's all the input we need. On to the scary stuff.

Step 3: create an empty post

My basic idea is to take the title and date provided above and pass it to a single function that will write out an empty post file:

createEmptyPost(title, date);

Before I can do that, I'll need a few things:

First, a function that creates and returns the post slug. To do this, I'll want to:

Uh, so:

function createTitleSlug(title: string): string {
    return title.replace(/\s/gi, '_')
                .replace(/[^a-z0-9_]/gi, '')
                .toLowerCase()
                .slice(0, 50);
}

Next, a function that takes the slug and date, and creates a filename:

function createFileName(titleSlug: string, date: Date): string {
    const formattedDate = df(date, '%YYYY-%MM-%DD_%H-%m-%s-%l');
    return `${formattedDate}_${titleSlug}.md`;
}

Then a function that creates a file path to be used as a permalink. My original plan was to configure this path using the archive_format field of blog-config.json but decided nah, archives are daily, because I say so.

function createArchivePath(blog: Blog, titleSlug: string, date: Date): string {
    const format = '%YYYY/%MM/%DD';
    const dateFragment = df(date, format);
    return `/${blog.root}/${dateFragment}/${titleSlug}/`;
}

Finally, the function that writes the Markdown file using the markdownPostTemplate from above. This function will create the /posts/ directory if it doesn't exist. The "wx" flag passed to writeFileSync should ensure the operation fails if a file with the same name already exists.

function createPostFile(filename: string, post: Post) {
    if (!fs.existsSync('./posts')){
        fs.mkdirSync('./posts');
    }
    const content = markdownPostTemplate(post);
    try {
        fs.writeFileSync(`./posts/${filename}`, content, { flag: 'wx' });
    } catch (err) {
        die(`Could not write Markdown file: ${err.message}`);
    }
}

Now that we have all the helper functions, here's the body of createEmptyPost:

function createEmptyPost(title: string, date: Date) {

    const titleSlug = createTitleSlug(title);
    const filename = createFileName(titleSlug, date)
    const postLink = createArchivePath(blog, titleSlug, date);

    const postDate = df(date, '%YYYY-%MM-%DD %H:%m:%s');

    const post: Post = {
        title: title,
        date: postDate,
        filename: filename,
        author_uid: blog.authors[0].author_uid,
        status: 'publish',
        slug: titleSlug,
        guid: postLink
    };

    createPostFile(filename, post);
    console.log(`Created file for post: ${post.title}`);
}

Okay! So, after compiling all of this with tsc, I can run:

node ./build/create-post.js -t "A script to create a post" -d "2019-12-16 09:00:00"

Did it work?

% ls posts
2019-12-03_22-10-19-000_hello_world.md
2019-12-04_20-00-00-000_lets_start_with_some_types.md
2019-12-04_20-00-00-001_baby_step_okay_lets_write_some_config_stuff.md
2019-12-04_20-00-00-002_adventures_in_js_type_checking.md
2019-12-05_17-26-40-406_some_admin_notes.md
2019-12-10_22-15-55-728_we_have_ignition.md
2019-12-11_22-10-19-020_baby_step_some_environment_notes.md
2019-12-16_9-00-00-000_a_script_to_create_a_post.md # <- here it is!
%

And the contents?

---
title: "A script to create a post"
date: 2019-12-16 9:00:00
filename: 2019-12-16_9-00-00-000_a_script_to_create_a_post.md
status: publish
author_uid: scottandrew
slug: a_script_to_create_a_post
guid: /posts/2019/12/16/a_script_to_create_a_post/
thumbnail_image: 
opengraph_image: 
tags: 
excerpt: ""
---

Write a blog post here

How about a missing title?

% node ./build/create-post.js
Blog config loaded and validated.
ERROR: No title parameter supplied.
%

An invalid date?

% node ./build/create-post.js -t "A new post" -d "2019-12-32"
Blog config loaded and validated.
ERROR: Can't use date: 2019-12-32
%

A non-unique filename?

% node ./build/create-post.js -t "A script to create a post" -d "2019-12-16 09:00:00"
Blog config loaded and validated.
ERROR: Could not write Markdown file: EEXIST: file already exists, open './posts/2019-12-16_9-00-00-000_a_script_to_create_a_post.md'
%

So far so good! But there are problems, which I'll write about next.


Baby step: some environment notes

I wanted to back up and write a bit about the setup for this blog app project. I mentioned using TypeScript, but not how.

Earlier, I installed TypeScript on my laptop:

% npm install -g typescript

TypeScript comes with access to the tsc command, which compiles TypeScript files to JavaScript that can be run with node.

Here's my project directory structure so far (after some cleanup):

/build/             /* compiled JS files */
/html/              /* final blog output files */
/posts/             /* Markdown source files */
/src/               /* source TypeScript files */
    /types          /* custom type definitions */
/templates/         /* template files for rendering */
app.ts              /* app that generates the blog */
create-post.ts      /* app that creates new post files */
blog-config.json
package.json

The blog-config.json file is described here and contains metadata about the blog overall. (I have renamed it with a hyphen from an underscore, to be consistent with other file naming conventions.)

The package.json file has a scripts entry that reads:

"scripts": {
    "build": "tsc --outDir build app.ts create-post.ts;",
},

When I run npm run build in this directory, the top-level app files app.ts and create-post.ts will be compiled from TypeScript to JavaScript and placed in the build directory. This should also recursively compile any TS files those apps depend on, such as anything referenced in the src and templates folder.

So the idea here is to create a workflow where:

  1. I run create-post to generate an empty Markdown file in /posts/ with some pre-filled Frontmatter metadata.
  2. I write the post body and tweak metadata with a text editor.
  3. I run app which chews through everthing in /posts/ and creates all the files and indexes, using the templates in /templates/ and placing the results in /html/
  4. Finally I upload the contents of /html/ to my site, wherever that ends up being.

Pretty straightforward. Well, you'd think. Anyway, gotta start somewhere, so next I'll be writing about create-post.


We have ignition!

Victory! This blog is now generated by a Node app that creates an RSS feed, an index and individual archives by date. I realized I had to race ahead and get the basic stuff up and running to save my sanity from updating the index and RSS feed by hand. Hopefully this will straighten out the RSS issues. Also: permalinks!

I'm still refining the core engine and working making the code less gross.

So far I haven't had to install a ton of npm libraries, although the ones I've installed so far have done a lot of heavy lifting:

I have also rediscovered the sheer hell of date handling. I'm using a library for date and time formatting, but it's pretty limited in what it offers. Moment seems to be the hotness for this, but seems kind of heavy (2.8 MB!) for my purposes.

To make my life easier, I wrote a separate script called create-post that outputs an empty Markdown file with some defaults. I'll write about this later, but to create this post, I ran:

node ./build/create-post.js --title 'We have ignition!'

And the output was a file with the following Frontmatter header:

---
title: "We have ignition!"
date: 2019-12-10 22:15:55
filename: 2019-12-10_22-15-55-728_we_have_ignition.md
status: publish
author_uid: scottandrew
slug: we_have_ignition
guid: /posts/2019/12/10/we_have_ignition/
thumbnail_image: 
opengraph_image: 
tags: 
excerpt:
---

A good start! Anyway, looking forward to when I inevitably feed something to this app that breaks it.


Some admin notes

If this blog is acting wobbly in your feed reader, it's probably because I'm updating the RSS feed by hand and getting things horribly wrong. There are no permalinks yet and the pub dates are basically lies. Hopefully this will be temporary, as generating a feed is the first priority of this project. But that's still a long way away. So my apologies if you start seeing re-published entries and the like.

Also, I just couldn't take the vanilla, 1993 hypertext look anymore so I had to add a bit of styling, which you won't see in a feed reader.


Adventures in JS type checking

In the last post I made a config file for this blog project that conformed to the Blog interface described in a previous post. The goal was to have the app load this file and validate it. I got stuck almost immediately.

TypeScript interfaces aren't actually that useful for data validation. That's because TS types are only used when you're developing the application. They're not used when you run the application. In fact, they're removed from the final codebase during compilation.

This feels counterintuitive. If you write a function that takes an argument of type Foo, and then elsewhere write some code that calls that function with a value of type Bar, the compiler (and maybe your IDE) will yell at you. But if your application loads some data from a file or Ajax request and passes it to that function, it's too late to check if that data conforms to the Foo type definition.

That's basically what this blog application is going to do: load a lot of JSON, YAML and Markdown text and parse it into HTML files at runtime. This TypeScript caveat means that the app can't tell if a blog post is missing a required field, like a title, unless I write code that checks for it and every other required field. So what's the use of having types if we can't use them for data validation when it counts?

OSS to the rescue. The ts-interface-checker library generates data validators from TypeScript type definitions. These validators (called "checkers") can be used at runtime, so we can take an data object, like a blog post, and check that it conforms to the Post type, even though the type itself was compiled out of the codebase.

I added a build step that regenerates the checkers automatically so they stay in sync with the type definitions, then wrote a wrapper function that can check any object type:

// src/checkType.ts

import exportedTypeSuite from '../types/index-ti';
import { createCheckers } from "ts-interface-checker";

const TypeChecker = createCheckers(exportedTypeSuite);

export function checkType(object: any, type: string) {
    if (!TypeChecker[type]) {
        throw new Error(`checkType: type ${type} is undefined.`);
    }
    try {
        TypeChecker[type].check(object);
    } catch (err) {
        console.error(`checkType: object does not validate as type ${type}.`);
        console.error(err);
    }
}

export default checkType;

Now I can load the blog config JSON file and validate it at runtime!

// index.js (the main app for now)

import { checkType } from './src/checkType';

const CONFIG_PATH = '../blog_config.json';
const config = require(CONFIG_PATH);

checkType(config, 'Blog');

console.log('Blog config loaded and validated.');

Before this, my app was happily loading the blog config as an object of type Blog, even if when I removed all of the fields. Now it can tell me exactly what fields are missing or have incorrect types. Perfect.


Baby step: okay let's write some config stuff

I created the Blog type interface to describe blog metadata and define things like archive format. The idea here is to write a JSON config file that conforms to this interface:

{
    "title": "Scott Andrew",
    "url": "https://scottandrew.com",
    "description": "Scott Andrew makes stuff for the web, draws comics, and plays in bands. He thinks you're pretty cool.",
    "language": "en-US",
    "archiveFormat": "YearMonthDay",
    "archiveWithSlug": true,
    "indexPosts": 10,
    "authors" : [
        {
            "name": "Scott Andrew",
            "contact": "scottandrew@gmail.com",
            "urls" : [
                {
                    "title": "Link to scottandrew.com",
                    "text": "Website",
                    "url": "https://scottandrew.com"
                },
                {
                    "title": "Link to Twitter",
                    "text": "Twitter",
                    "url": "https://twitter.com/scottandrew"
                },
                {
                    "title": "Link to Instagram",
                    "text": "Instagram",
                    "url": "https://twitter.com/scottandrew"
                }
            ]
        }
    ]
}

As you can see I'm already scope creeping by supporting multiple authors. o_O

The next baby step is to just write a simple Node app that loads this config file and validates it.


All posts