June 27, 2007, 10:14 p.m.

The Fastest Month Parser in the West

I was trying to remove bottlenecks in my logmerge program when I noticed it was spending more time than I thought it should trying to parse months. Going back to my fast and easy token classification in C post, I figured maybe it'd be a good idea to not have my algorithm be O(n) (where n == 12).

The first thing I did was replace my naïve month parser with a generated one using the script I mentioned in the aforementioned post. This was, of course, faster, but I still kept seeing it showing up in my profiler.

I thought of what I felt was a clever adaptive algorithm that's O(1) in the normal case (current month is the same as previous month). It was clever, and a little more readable, but not fast.

Then I showered or something and the ultimate solution hit me. It's easier to read than the naïve solution, and faster than anything else I've been able to figure out how to do (with the help of an optimizing C compiler). It looks like this:

static int parseMonth(const char *input) {
    int rv=-1;
	int inputInt=0;

	for(int i=0; i<4 && input[i]; i++) {
		inputInt = (inputInt << 8) | input[i];
	}

	switch(inputInt) {
		case 'Jan/': rv=0; break;
		case 'Feb/': rv=1; break;
		case 'Mar/': rv=2; break;
		case 'Apr/': rv=3; break;
		case 'May/': rv=4; break;
		case 'Jun/': rv=5; break;
		case 'Jul/': rv=6; break;
		case 'Aug/': rv=7; break;
		case 'Sep/': rv=8; break;
		case 'Oct/': rv=9; break;
		case 'Nov/': rv=10; break;
		case 'Dec/': rv=11; break;
	}

	return rv;
}

This constant integer literal syntax is fairly obscure (outside of Mac programming, anyway). It compiles and runs correctly on every C compiler I've got and both big and little endian machines. It does assume 32-bit integers and at least four bytes in whatever buffer it receives, but both of those are met on my system, and when assertions are compiled in, it self-tests at startup.

I'd assume someone has written that exact code before, but it was new to me, so I figured I'd share it.

Table of Contents

Feeds

The Fastest Month Parser in the West