More Than You Want to Know About Java Serialization
Serialization is Easy!
Most people who have worked with java enough know the basic about java serialization. If your class implements the tagging interface java.io.Serializable
, java.io.ObjectOutputStream
will handle writing it to a binary file in such a way that java.io.ObjectInputStream
can read it back so that you get your original object. It's easy.
However, this is often insufficient. You'll easily find the serialized forms of your objects entirely too large, or perhaps you'll have fields that either shouldn't be serialized, or should be deserialized in such a way that preserves the singleton nature of certain referenced objects. This is where things get ugly.
Pay No Attention to the Field Behind the Curtain
The first thing you should know about is the transient
keyword. According to the JLS,
Variables may be marked transient to indicate that they are not part of the persistent state of an object.
The obvious thing to do is to mark any field as transient
when you do not wish them to be serialized. However, this can often lead you with a bit of a conundrum:
If transient fields aren't stored or restored during the serialization process, how will they get set?
The naïve programmer (speaking for myself, of course) will find the answer obvious from the following code:
public class Test implements Serializable { private transient String string1="some string"; private transient String string2=null; public Test() { super(); string2="another string"; } }
Obviously, after serialization, the value of string1
will be "some string"
and the value of string2
will be "another string"
. This, however, is incorrect. Both values will be null. But why?
Well, serialization is more complicated than I originally assumed it was. As it turns out, object initialization is not called for any class that implements java.io.Serializable
. Consider the following example class hierarchy:
java.lang.Object
->BaseObject
->SomeSerializableObject
->SubclassOfSerializable
When deserializing a serialized SubclassOfSerializable
instance, none of the initializers of SubclassOfSserializable
or SomeSerializableObject
will be invoked. However, the object initializers of java.lang.Object
and BaseObject
will be invoked along with the empty constructor of BaseObject
. I'll leave it as an exercise for the reader to figure out how they create instances of classes without calling the constructor (hint: it involves sun.reflect.ClassFileAssembler
).
...But I Want Default Values for My Transients
Now that you know that it doesn't happen, you're probably wondering what you can do about it. Well, the easiest thing to do is to add a method private void readObject(java.io.ObjectInputStream in) throws IOException, ClassNotFoundException
to your class and do post-deserialization initialization there. To avoid code duplication, I usually do something like the following:
public class Test implements Serializable { private transient String string1 = null; private transient String string2 = null; public Test() { super(); initTransients(); } // Initialize all of the transient fields. private void initTransients() { string1 = "some string"; string2 = "another string"; } // Used by deserialization to handle the reading of // the data and applying what's read to fields. private void readObject(java.io.ObjectInputStream in) throws IOException, ClassNotFoundException { in.defaultReadObject(); initTransients(); } }
As long as initTransients
is called from all of your constructors and readObject
, you'll have control of the default value of all of your transients.
When Do I Use Transients?
Given the definition of transient, it's not immediately obvious why a field would belong to a class if it didn't represent part of the state of a class that would matter across a serialization cycle. In my personal usage, I've done basically two things with transients:
- Things that are for runtime support (loggers, transaction/mutation state, etc...)
- Storage efficiency.
I'm going to focus on #2 since that's more interesting. Let's consider a rather simplified example of how one might use transients to more efficiently store an object that references an instance of an object that may be retrieved from a DAO.
public class TestCustomRef implements Serializable { private String name=null; private transient TestDTO obj=null; public TestCustomRef(String n, int id) { super(); name=n; obj=TestDAO.getObject(id); } private void writeObject(ObjectOutputStream out) throws IOException { out.defaultWriteObject(); // Write out the object ID out.writeInt(obj.getId()); } private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException { in.defaultReadObject(); // restore the object instance from the DTO obj=TestDAO.getObject(in.readInt()); } }
The above example demonstrates the use of writeObject
and readObject
to do custom serialization of a particular piece of data after doing the normal serialization stuff. Also note tha in the above example, TestDTO
does not have to be serializable itself, because it's not getting serialized.
Neat, But Can I Make the DTO Serialize Itself Better?
Yes you can! It is possible to make the DTO serialize itself to almost just its ID (plus other overhead like class names) such that the most current value of the referenced object will always be available when deserializing the containing object. This works by creating a new class to represent a serialization proxy for the DTO, as seen in the following example:
[uninteresting details of TestDTO implementation] // During serialization, this method is invoked to find // an object that will be serialized instead of TestDTO private Object writeReplace() throws ObjectStreamException { // Create and return a $Serializedform return new SerializedForm(getId()); } // This class is used to hold a reference to // a TestDTO across serialization. private static class SerializedForm implements Serializable { private int id=1; public SerializedForm(int i) { id=i; } // When deserializing, this method will be invoked // to find a replacement object to return instead // of $SerializedForm private Object readResolve() throws ObjectStreamException { // Look up and return the original object return TestDAO.getObject(id); } } // End of $SerializedForm
This Has Been Very Interesting, But I'm Concerned About the Lack of XML in My Serialized Java Objects.
If you really like verbosity (or, in general, want more control over your serialized form), two options are available.
If you want control at the individual object level, you can declare the object Externalizable
rather than Serializable
and completely control what goes in and out of the stream. Externalizable
works very much like Serializable
, except that it does not automatically deal with any of the fields within your class.
If you want control at the stream level, then you just need to make your own implementation of java.io.ObjectOutput
and do whatever sort of damage you might want to do from there. This will be called by the serialization process for each field that's being written. Of course, you'll need to create a corresponding java.io.ObjectInput
to deal with reading it back in. This is likely to be a large and painful process, so you can do that yourself.