Oct. 4, 2006, 11:54 p.m.

What's the Deal With Micro-Optimizations?

I don't get why people make the biggest deals out of the parts of a system with the least significance.

I was trying to standardize what is effectively a web app in the hopes to get it running under JBoss (or some normal container). This application did some initialization in a main() and then started jetty to handle HTTP requests.

[Guy at work] said that this one particular piece will probably not be placed into JBoss because of the overhead, so I shouldn't bother moving forward on what I was doing. This is where it got awkward for me. In addition to just restructuring the code for an app server, I've also been migrating the build over to maven and writing lots and lots of unit tests.

Along with unit tests comes inevible refactoring. I was looking at some of the components of the code, and they're all these O(n) search things sitting inside of methods that are locked and performing multiple database transactions. Basically, in this one piece, there are two items that acquire a lock on the instance:

Loading stuff from the DB
Finding stuff related to a particular entity, and then clearing it in the DB

Let's describe the algorithm of each of these independently:

Original Implementation

Item 1: Load Stuff from DB

Acquire Lock
Obtain database connection from pool
Issue query (give me all the stuff not marked done)
Convert each result set row to an object and place in a collection
Return connection to pool
Store collection in object
Release lock

Item 2: Get Stuff for Entity

Obtain lock
Scan entire collection of objects and check to see if each one matches
For each one that matches:
1. Remove object from collection
2. Obtain database connection from pool.
3. Issue update marking object as complete
4. Return connection to pool
Release lock

That doesn't seem right to me. After writing the unit tests and modeling the behavior very close for error conditions, etc... I made mine work as follows

Refactorization

Item 1: Load Stuff from DB

Obtain database connection from pool
Issue query (give me all the stuff not marked done)
Convert each result set row to an object and place in a collection
Return connection to pool
Obtain lock
Filter collection against the current blacklist (see below).
Store collection in object indexed by entity key (Map<Integer, List<Whatever>>)
Release lock

Item 2: Get Stuff for Entity

Obtain lock
Get list of stuff for entity -- an O(1) operation
Add list to the blacklist so they won't get picked up on the next DB load just in case they're not cleared in time
Release lock
For each we're returning:
1. Remove object from collection
2. Obtain database connection from pool.
3. Issue update marking object as complete
4. Return connection to pool

Now, the DB work still isn't optimal, but there's no longer any DB work in the critical section. Just in-memory structure manipulation, and quite optimal at that.

On my quest to convert O(n) operations to O(1), I also tested and then rewrote this resource allocation thing . The resources being allocated were port numbers, and the mechanism involved three boolean arrays of 65,536 entries each which were used to tell us which things were in use. There were three because one was what the application was allocating, one was what was discovered based on what the network stack would report, and the other prevented the app from having to allocate a new 64 kiloentry boolean array whenever it wanted to refresh the stuff from the OS.

The allocation strategy was to start counting at whatever the bottom limit number was and check the two active boolean arrays to determine whether the resource is in use. That gets really fun towards the end. There was some synchronization in there, too, but I don't think it was correct, and didn't really bother analyzing it.

With a test modeling the behavior, I decided to express the whole thing in terms of a single set of integers. Any number within my set was available. All I have to do to allocate a port is pop the first entry of the set off and we're good. Whenever we notice new things in use by the network stack, we pull them out as well. Effectively, we're O(1) (OK, really O(log n) because I used a TreeSet to preserve allocation order so the existing unit tests would continue to work). My unit test (which included running out of the resource) went from about a 15 second runtime to about a half second runtime.

Hopefully the optimization and testing buy us enough that we can take the overhead of having our services invoked by JBoss.

Table of Contents

Feeds

What's the Deal With Micro-Optimizations?

Original Implementation

Item 1: Load Stuff from DB

Item 2: Get Stuff for Entity

Refactorization

Item 1: Load Stuff from DB

Item 2: Get Stuff for Entity