Internal Representation of Dates
I was planning to have a bunch of timestamped objects implementing an interface that returns a java.util.Date object and was wondering what the storage overhead of a java.util.Date was vs. a long. Keep in mind, that by using a long, I'd need to construct a java.util.Date each time the method was called that wanted one, and I didn't want to memoize it since that would seem to defeat a lot of the purpose of the exercise if these things were to be long lived.
So I conducted an experiment. Basically, I would request a GC, and then ask the JVM how much memory it had at the beginning to get an idea of how much memory was required just to start the program (~150k). I would then create an array of n container objects that would hold either a simple long, or an object, fill the array, request another GC, and then find out how much memory is in use afterwards.
The memory usage was computed using (Runtime.totalMemory() - Runtime.freeMemory()). In the server vm, that behaves just...weird. The values just don't make any sense. We'll ignore those and check out the consistent values I got from the client vm.
Now, having read the source code to Sun's java.util.Date implementation, I could see that there were actually two memory profiles for it. One where you basically have a wrapper around a long, and another where it contains that as well as a reference to another object that seems quite a bit heavier. This leaves me with five test cases:
- Array of null
- Array of wrapped null objects
- Array of wrapped longs
- Array of wrapped Date objects
- Array of wrapped normalized Date objects
Initially, I was testing with 1,000,000 objects, but when I normalized the dates, I couldn't fit them into memory, so I decided to go with 100,000. The results follow:
dustintmb:/tmp 590% java DateTest ; \ java DateTest n ; \ java DateTest l ; \ java DateTest d ; \ java DateTest dn Initial memory in use: 153248. Doing 100000 NOTHING took 6ms with 553536 in use (2031616 total). Delta: 400288 or 4.00288 each 0.096u 0.042s 0:00.16 81.2% 0+0k 0+3io 0pf+0w Initial memory in use: 153288. Doing 100000 WRAPPED_NULL took 87ms with 3746568 in use (6836224 total). Delta: 3593280 or 35.9328 each 0.189u 0.051s 0:00.26 88.4% 0+0k 0+0io 0pf+0w Initial memory in use: 153288. Doing 100000 LONG took 94ms with 4546568 in use (8171520 total). Delta: 4393280 or 43.9328 each 0.197u 0.052s 0:00.26 92.3% 0+0k 0+0io 0pf+0w Initial memory in use: 153288. Doing 100000 DATE took 156ms with 6147512 in use (11100160 total). Delta: 5994224 or 59.94224 each 0.254u 0.058s 0:00.36 83.3% 0+0k 0+0io 0pf+0w Initial memory in use: 153288. Doing 100000 DATE_NORM took 590ms with 16749904 in use (30081024 total). Delta: 16596616 or 165.96616 each 0.659u 0.086s 0:00.82 89.0% 0+0k 0+1io 0pf+0w
Of course, jmap is pretty good for this kind of thing, but doesn't seem to work on my Mac, so I just did it myself. As you can see, it takes 4 bytes per item just to have the array, so subtract 4 bytes for the respective storage. Giving the null as a baseline, we get the following size results (adjusted):
- 0 (control)
- 32
- 40
- 54
- 162
In conclusion, I'll just use the long and let the GC do the extra work if people are asking for these Dates. If that becomes a problem, we'll either get the interface changed to just use the long, or we'll consider dealing with the memoization costs.