Nov. 9, 2006, 12:54 p.m.

Internal Representation of Dates

I was planning to have a bunch of timestamped objects implementing an interface that returns a java.util.Date object and was wondering what the storage overhead of a java.util.Date was vs. a long. Keep in mind, that by using a long, I'd need to construct a java.util.Date each time the method was called that wanted one, and I didn't want to memoize it since that would seem to defeat a lot of the purpose of the exercise if these things were to be long lived.

So I conducted an experiment. Basically, I would request a GC, and then ask the JVM how much memory it had at the beginning to get an idea of how much memory was required just to start the program (~150k). I would then create an array of n container objects that would hold either a simple long, or an object, fill the array, request another GC, and then find out how much memory is in use afterwards.

The memory usage was computed using (Runtime.totalMemory() - Runtime.freeMemory()). In the server vm, that behaves just...weird. The values just don't make any sense. We'll ignore those and check out the consistent values I got from the client vm.

Now, having read the source code to Sun's java.util.Date implementation, I could see that there were actually two memory profiles for it. One where you basically have a wrapper around a long, and another where it contains that as well as a reference to another object that seems quite a bit heavier. This leaves me with five test cases:

  1. Array of null
  2. Array of wrapped null objects
  3. Array of wrapped longs
  4. Array of wrapped Date objects
  5. Array of wrapped normalized Date objects

Initially, I was testing with 1,000,000 objects, but when I normalized the dates, I couldn't fit them into memory, so I decided to go with 100,000. The results follow:

dustintmb:/tmp 590% java DateTest ; \
    java DateTest n ; \
    java DateTest l ; \
    java DateTest d ; \
    java DateTest dn
Initial memory in use:  153248.  Doing 100000
NOTHING took 6ms with 553536 in use (2031616 total).
Delta:  400288 or 4.00288 each
0.096u 0.042s 0:00.16 81.2%     0+0k 0+3io 0pf+0w
Initial memory in use:  153288.  Doing 100000
WRAPPED_NULL took 87ms with 3746568 in use (6836224 total).
Delta:  3593280 or 35.9328 each
0.189u 0.051s 0:00.26 88.4%     0+0k 0+0io 0pf+0w
Initial memory in use:  153288.  Doing 100000
LONG took 94ms with 4546568 in use (8171520 total).
Delta:  4393280 or 43.9328 each
0.197u 0.052s 0:00.26 92.3%     0+0k 0+0io 0pf+0w
Initial memory in use:  153288.  Doing 100000
DATE took 156ms with 6147512 in use (11100160 total).
Delta:  5994224 or 59.94224 each
0.254u 0.058s 0:00.36 83.3%     0+0k 0+0io 0pf+0w
Initial memory in use:  153288.  Doing 100000
DATE_NORM took 590ms with 16749904 in use (30081024 total).
Delta:  16596616 or 165.96616 each
0.659u 0.086s 0:00.82 89.0%     0+0k 0+1io 0pf+0w

Of course, jmap is pretty good for this kind of thing, but doesn't seem to work on my Mac, so I just did it myself. As you can see, it takes 4 bytes per item just to have the array, so subtract 4 bytes for the respective storage. Giving the null as a baseline, we get the following size results (adjusted):

  1. 0 (control)
  2. 32
  3. 40
  4. 54
  5. 162

In conclusion, I'll just use the long and let the GC do the extra work if people are asking for these Dates. If that becomes a problem, we'll either get the interface changed to just use the long, or we'll consider dealing with the memoization costs.

blog comments powered by Disqus