July 29, 2006, 9:08 p.m.

More Than You Want to Know About Java Serialization

Serialization is Easy!

Most people who have worked with java enough know the basic about java serialization. If your class implements the tagging interface java.io.Serializable, java.io.ObjectOutputStream will handle writing it to a binary file in such a way that java.io.ObjectInputStream can read it back so that you get your original object. It's easy.

However, this is often insufficient. You'll easily find the serialized forms of your objects entirely too large, or perhaps you'll have fields that either shouldn't be serialized, or should be deserialized in such a way that preserves the singleton nature of certain referenced objects. This is where things get ugly.

Pay No Attention to the Field Behind the Curtain

The first thing you should know about is the transient keyword. According to the JLS,

Variables may be marked transient to indicate that they are not part of the persistent state of an object.

The obvious thing to do is to mark any field as transient when you do not wish them to be serialized. However, this can often lead you with a bit of a conundrum:

If transient fields aren't stored or restored during the serialization process, how will they get set?

The naïve programmer (speaking for myself, of course) will find the answer obvious from the following code:

public class Test implements Serializable {
  private transient String string1="some string";
  private transient String string2=null;

  public Test() {
    super();
    string2="another string";
  }
}

Obviously, after serialization, the value of string1 will be "some string" and the value of string2 will be "another string". This, however, is incorrect. Both values will be null. But why?

Well, serialization is more complicated than I originally assumed it was. As it turns out, object initialization is not called for any class that implements java.io.Serializable. Consider the following example class hierarchy:

java.lang.Object 
    -> BaseObject
        -> SomeSerializableObject
            -> SubclassOfSerializable

When deserializing a serialized SubclassOfSerializable instance, none of the initializers of SubclassOfSserializable or SomeSerializableObject will be invoked. However, the object initializers of java.lang.Object and BaseObject will be invoked along with the empty constructor of BaseObject. I'll leave it as an exercise for the reader to figure out how they create instances of classes without calling the constructor (hint: it involves sun.reflect.ClassFileAssembler).

...But I Want Default Values for My Transients

Now that you know that it doesn't happen, you're probably wondering what you can do about it. Well, the easiest thing to do is to add a method private void readObject(java.io.ObjectInputStream in) throws IOException, ClassNotFoundException to your class and do post-deserialization initialization there. To avoid code duplication, I usually do something like the following:

public class Test implements Serializable {
  private transient String string1 = null;
  private transient String string2 = null;

  public Test() {
    super();
    initTransients();
  }

  // Initialize all of the transient fields.
  private void initTransients() {
    string1 = "some string";
    string2 = "another string";
  }

  // Used by deserialization to handle the reading of
  // the data and applying what's read to fields.
  private void readObject(java.io.ObjectInputStream in)
    throws IOException, ClassNotFoundException {

    in.defaultReadObject();
    initTransients();
  }
}

As long as initTransients is called from all of your constructors and readObject, you'll have control of the default value of all of your transients.

When Do I Use Transients?

Given the definition of transient, it's not immediately obvious why a field would belong to a class if it didn't represent part of the state of a class that would matter across a serialization cycle. In my personal usage, I've done basically two things with transients:

  1. Things that are for runtime support (loggers, transaction/mutation state, etc...)
  2. Storage efficiency.

I'm going to focus on #2 since that's more interesting. Let's consider a rather simplified example of how one might use transients to more efficiently store an object that references an instance of an object that may be retrieved from a DAO.

public class TestCustomRef implements Serializable {

  private String name=null;
  private transient TestDTO obj=null;

  public TestCustomRef(String n, int id) {
    super();
    name=n;
    obj=TestDAO.getObject(id);
  }

  private void writeObject(ObjectOutputStream out)
    throws IOException {

    out.defaultWriteObject();
    // Write out the object ID
    out.writeInt(obj.getId());
  }

  private void readObject(ObjectInputStream in)
    throws IOException, ClassNotFoundException {

    in.defaultReadObject();
    // restore the object instance from the DTO
    obj=TestDAO.getObject(in.readInt());
  }

}

The above example demonstrates the use of writeObject and readObject to do custom serialization of a particular piece of data after doing the normal serialization stuff. Also note tha in the above example, TestDTO does not have to be serializable itself, because it's not getting serialized.

Neat, But Can I Make the DTO Serialize Itself Better?

Yes you can! It is possible to make the DTO serialize itself to almost just its ID (plus other overhead like class names) such that the most current value of the referenced object will always be available when deserializing the containing object. This works by creating a new class to represent a serialization proxy for the DTO, as seen in the following example:

  [uninteresting details of TestDTO implementation]

  // During serialization, this method is invoked to find
  // an object that will be serialized instead of TestDTO
  private Object writeReplace()
    throws ObjectStreamException {

    // Create and return a $Serializedform
    return new SerializedForm(getId());
  }

  // This class is used to hold a reference to
  // a TestDTO across serialization.
  private static class SerializedForm
    implements Serializable {

    private int id=1;

    public SerializedForm(int i) {
      id=i;
    }

    // When deserializing, this method will be invoked
    // to find a replacement object to return instead
    // of $SerializedForm
    private Object readResolve()
      throws ObjectStreamException {

      // Look up and return the original object
      return TestDAO.getObject(id);
    }
  } // End of $SerializedForm

This Has Been Very Interesting, But I'm Concerned About the Lack of XML in My Serialized Java Objects.

If you really like verbosity (or, in general, want more control over your serialized form), two options are available.

If you want control at the individual object level, you can declare the object Externalizable rather than Serializable and completely control what goes in and out of the stream. Externalizable works very much like Serializable, except that it does not automatically deal with any of the fields within your class.

If you want control at the stream level, then you just need to make your own implementation of java.io.ObjectOutput and do whatever sort of damage you might want to do from there. This will be called by the serialization process for each field that's being written. Of course, you'll need to create a corresponding java.io.ObjectInput to deal with reading it back in. This is likely to be a large and painful process, so you can do that yourself.

blog comments powered by Disqus