Is it safe?

Okay so the last post was a long one! These next few are gonna have to be shorter if I want to get this done before my host pulls the plug.

I don't plan on doing a lot of input sanitization, but decided to add a function that scrubs dangerous characters out of a string. This will keep me from wrecking my website when I forget to close an HTML tag somewhere or jam some bad characters into a tag attribute.

It seemed like it was a straighforward task just to implement something that observed the OWASP Rule #1 for preventing cross-site scripting. I started by defining a regular expression to match all the characters I want to escape:

const DISALLOWED = /[&<>"'\/]/gi;

This is the minimum required. However, OWASP Rule #2 recommends escaping all non-alphanumeric characters with ASCII values under 256 to ensure that a string can't break a tag attribute. I could just escape everything, except I'd like to have things like emojis still work.

So: we take the string and examine each character in order. If that character is not alphanumeric and has a code of 255 or less, we replace it with its numeric entity. We assemble a new string by pushing these values (both escaped and not) into an array, which we join at the end.

const ALPHANUM = /[a-z0-9]/i;

export function sanitize(content: string): string {
    const buffer = [];
    for (let i = 0; i < content.length; i++) {
        if (content[i].search(ALPHANUM) < 0 &&
            content[i].charCodeAt(0) < 256) {
            buffer.push(`&#${content[i].charCodeAt(0)};`);
        } else {
            buffer.push(content[i]);
        }
    }
    return buffer.join('');
}

Fun fact! The efficiency of creating a string with Array::join() versus string concatenation varies by implementation. Since sanitize is meant to run by Node and not in a browser, it would maybe be more efficient to use String::concat, but most of what I'll be sanitizing will be fairly short strings and I don't think the optimizations will have much impact. I can always change it later. Ugh.


Posted in: blogging, code, javascript, node

Previously: A script to create a post

Next: 20 years since Y2K