March 23, 2007, 3:58 p.m.

A Parallel Map in Java

I was just playing around and came up with implementing as transparent of a parallel map as I could in java. It worked fairly well, although required more typing than I'd have liked. I intend to polish it up and add it to my common java library.

The basic goal was to try to express a typesafe transformation of a list of objects into a list of objects of a different type where that transformation may be executed in parallel without losing the sequence of objects. Fundamentally, java's java.util.concurrent.ExecutorService does a lot of the work . I was actually hoping to implement that part, too, but theirs looks almost exactly how I was imagining mine looking.

Here's the basic concept: I've got a list of things of a certain type. I want to make a new list of things where each member represents the result of the member of the first list at the same position having passed through a transformation function.

To provide a more concrete example, let's say I have a list of the integers from one to five (inclusive). I would like to make a new list where each number is squared and then converted to a string. I would express that like this in haskell:

Hugs> map (\x -> show (x * x)) [1..5]
["1","4","9","16","25"]

Now, what if that transformation is expensive (or sometimes expensive)? We've all got lots of processors now, so we'll do them in parallel. I might create a pmap function that takes some kind of function executor that does parallel magic behind the scene, add another argument in the beginning (or just curry it earlier on), and the bulk of the code doesn't have to change.

However, the way this is typically done in java is to build a new list, iterate the current list, perform the transformation code on each element inside of your loop and add the result to the new list. If you want to make that parallel, you just have to start over.

In my implementation, I made a generic Transformer<F,T> interface that has one method: public T transform(F in). Implementing this is the hard part (watch as I leave out the rant about the lack of closures/lambdas being what makes this so difficult).

Given a Transformer<F,T>, and a Collection<F>, you can easily get a List<T> without paying too much attention to how any of it happens. Let's say we have the following Transformer<Integer,String> which provides the transformation described above:

// This is the (\x -> show (x * x)) part from above
Transformer<Integer, String> t=
    new Transformer<Integer, String>() {
    public String transform(Integer in) {
        return String.valueOf(in*in);
    }   
};

Now, all you need is something to apply it. We'll call this Mapper<F,T>. It's got two constructors: An empty constructor, and one that takes an ExecutorService. The former gives a simple FP-style map construct, while the latter allows us to execute the conversion with arbitrary parallelism. Examples of each:

// Simple map
Mapper<Integer, String> m=new Mapper<Integer, String>();
List<Integer> l=Arrays.asList(1, 2, 3, 4, 5);
List<String> s=m.transform(t, intList);

// Parallel map
ExecutorService x=getThreadPool();
Mapper<Integer, String> m=new Mapper<Integer, String>(x);
List<Integer> l=Arrays.asList(1, 2, 3, 4, 5);
List<String> s=m.transform(t, intList);

In both cases, printing out s yields [1, 4, 9, 16, 25].

As a bonus, the thread pool returned by getThreadPool() above may be shared by many parts of your application. When the load is low, this allows you to execute more quickly using more of the available resources (i.e. processors). When the load is high, this allows you to throttle the execution by staying within a finite amount of processing power (thus, keeping the CPU load lower).

blog comments powered by Disqus