Tuesday 15 May 2007

Java Object Serialization

Java Object Serialization is often confused subject for me. I first learned about them when i was having my Java training. Then when i got to use this for persisting session information for our container, i read more on this and went through the SUN's JDK source. While going though i found something interesting, which lead to think that Java serialization can break in some cases. I may be wrong, but again i want users who read this blog to explain where did i go wrong. let us try first with an example. ClassA has a writeReplace which replaces itself with instance of ClassB.

public class ClassA
        implements Serializable
{
    private String classAData;

    public ClassA(String classAData)
    {
        this.classAData = classAData;
    }

    public String getClassAData()
    {
        return classAData;
    }

    public Object writeReplace()
    {
        System.out.println("ClassA.WriteReplace [" + this + "]");
        return new ClassB(classAData);
    }

    public Object readResolve()
    {
        System.out.println("ClassA.readResolve --> Nothing to do");
        return this;
    }

    public String toString()
    {
        return "Class A: " + classAData;
    }
}
and ClassB does vice-versa i.e has readResolve which replaces itself with instance of ClassA.
public class ClassB
        implements Serializable
{
    private String classBData;

    public ClassB(String classBData)
    {
        this.classBData = classBData;
    }

    public String getClassBData()
    {
        return classBData;
    }

    public Object writeReplace()
    {
        System.out.println("ClassB.writeReplace -->  Nothing to do");
        return this;
    }

    public Object readResolve()
    {
        System.out.println("ClassB.readResolve [" + this + "]");
        return new ClassA(classBData);
    }

    public String toString()
    {
        return "Class B: " + classBData;
    }
}
ClassC has a writeObject and readObject where it writes/reads its variable 'classCData' into/from object stream respectively.
public class ClassC
        implements Serializable
{
    private String classCData;

    public ClassC(String classCData)
    {
        this.classCData = classCData;
    }

    public String getClassCData()
    {
        return classCData;
    }

    private void writeObject(ObjectOutputStream out) throws IOException
    {
        System.out.println("ClassC.writeObject");
        out.writeObject("SU:" + classCData);
    }

    private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException
    {
        System.out.println("ClassC.readObject");
        String str = (String) in.readObject();
        classCData = str.substring(3);
    }

    public Object writeReplace()
    {
        System.out.println("ClassC.writeReplace --> Nothing to do");
        return this;
    }

    public Object readResolve()
    {
        System.out.println("ClassC.readResolve --> Nothing to do");
        return this;
    }

    public String toString()
    {
        return "Class C: " + classCData;
    }
}
CustomObjectInputStream has a resolveObject where it replaces an Object of ClassC with ClassB
public class CustomObjectInputStream
        extends ObjectInputStream
{
    public CustomObjectInputStream(InputStream in)
            throws IOException
    {
        super(in);
        enableResolveObject(true);
    }

    protected Object resolveObject(Object obj) throws IOException
    {
        if (obj instanceof ClassA || obj instanceof ClassB || obj instanceof ClassC) {
            System.out.println("CustomObjectInputStream.resolveObject [" + obj + "]");
        }
        if (obj instanceof ClassC) {
            return new ClassB(((ClassC) obj).getClassCData());
        }
        return obj;
    }
}
and CustomObjectOutputStream does vice versa i.e has a replaceObject where it replaces a an Object of ClassB with ClassC
public class CustomObjectOutputStream
        extends ObjectOutputStream
{
    public CustomObjectOutputStream(OutputStream out)
            throws IOException
    {
        super(out);
        enableReplaceObject(true);
    }

    protected Object replaceObject(Object obj) throws IOException
    {
        if (obj instanceof ClassA || obj instanceof ClassB || obj instanceof ClassC) {
            System.out.println("CustomObjectOutputStream.replaceObject [" + obj + "]");
        }
        if (obj instanceof ClassB) {
            return new ClassC(((ClassB) obj).getClassBData());
        }
        return obj;
    }
}
Now my Main class looks like below:
public class Main
{
    public static void main(String[] args) throws IOException, ClassNotFoundException
    {
        ByteArrayOutputStream byteOutStream = new ByteArrayOutputStream();
        CustomObjectOutputStream objOutStream = new CustomObjectOutputStream(byteOutStream);

        ClassA obj = new ClassA("Test***String");
        objOutStream.writeObject(obj);
        objOutStream.flush();
        byte[] data = byteOutStream.toByteArray();


        ByteArrayInputStream byteInStream = new ByteArrayInputStream(data);
        CustomObjectInputStream objInStream = new CustomObjectInputStream(byteInStream);
        Object reCreatedObj = objInStream.readObject();
        if (reCreatedObj instanceof ClassA) {
            System.out.println("Class is correct");
            if (obj.getClassAData().equals(((ClassA) reCreatedObj).getClassAData())) {
                System.out.println("data also matched");
            } else {
                System.out.println("data Didnt matched");
            }
        } else {
            System.out.println("Read and write DataObject Failed. returned Object [" + reCreatedObj + "]");
        }
    }
}
I expected output to be like:
ClassA.WriteReplace [Class A: Test***String]
ClassB.writeReplace -->  Nothing to do
CustomObjectOutputStream.replaceObject [Class B: Test***String]
ClassC.writeObject
ClassC.readObject
CustomObjectInputStream.resolveObject [Class C: Test***String]
ClassB.readResolve [Class B: Test***String]
ClassA.readResolve --> Nothing to do
Class is correct
data also matched
but the output is
ClassA.WriteReplace [Class A: Test***String]
ClassB.writeReplace -->  Nothing to do
CustomObjectOutputStream.replaceObject [Class B: Test***String]
ClassC.writeObject
ClassC.readObject
ClassC.readResolve --> Nothing to do
CustomObjectInputStream.resolveObject [Class C: Test***String]
Read and write DataObject Failed. returned Object [Class B: Test***String]
Now going through the ObjectOutputStream and ObjectInputStream Source Code you will find the following sequence followed in persisting and resolving an Object. While converting an Object into a series of bytes:
  1. Check whether ObjectOutputStream is created using overridden default constructor. if yes, just give callback to writeObjectOverride(Object obj) and return. (In our case, CustomObjectOutputStream doesn't override default Constructor. so go to step 2)
  2. If not, Check whether the Object to be persisted has a writeReplace method overridden. If yes, replace the incoming Object with returned Object. (in our case, ClassA has overridden WriteReplace. After this method invocation, the object to be persisted is Object of type ClassB). Repeat step 2 if the returned Object has a writeReplace(), until an object returned doesn't override writeReplace().
  3. Then Check whether ObjectOuputStream has a replaceObject(Object obj) method. If yes, pass the above returned Object and get the new Object to be persisted. This method would be invoked only when the ObjectOutputStream sub class, sets the enableReplaceObject(boolean) to true. (enableReplaceObject is set for our CustomObjectOutputStream. So invoking above method with object of type ClassB, returns Object of type ClassC)
  4. Then check whether the replaced object from above method implements Serializable/Externalizable. if not, throw InvalidClassException. (ClassC implements Serializable)
  5. else, write the class Description first. and then if above object extends Serializable invoke writeObject() if defined or else persist all the primitive data. Or if above Object extends Externalizable invoke writeExternal. (Object of type ClassC is converted into bytes using writeObject).
and while recreating, with above created bytes,
  1. Check whether ObjectInputStream is created using overridden default constructor. If yes, give call Back to readObjectOverride(), and return. (In our case, CustomObjectInputStream doesn't override defaultConstructor. so go to step 2)
  2. If not, read the classDescription First. and if the constructed class instance is not of type Serializable the data is corrupted, so throw a InvalidClassException. (Object of ClassC is created)
  3. Then check whether class implements Externalizable/Serializable and invoke readExternal or readObject() respectively and get an instance. (as ClassC implements Serializable invoke readObject())
  4. Then invoke the readResolve method if defined on the above resolve method. (now here, ClassC readResolve() method is invoked which doesn't do anything). This is not a recursive call unlike writeReplace().
  5. Then check whether the ObjectInputStream has a resolveObject. If yes, pass this Object and return the returned Object as the Object constructed from persisted bytes. (as ObjectInputStream has a resolveObject method defined, we pass Object of type ClassC, so returned Object would be of type ClassB).
So when we compare both persisted Object (type ClassA) and recreated Object (type ClassB) they don't match. If SUN's Java Serialization has been written in such a way that above sequences 4 and 5 are interchanged, it would have been exactly as i have expected. Now i don't know the reason why sun Java serialization is implemented in above fashion. if anyone there know please update me by putting your comments. PS: tried above example on SUN JDK 1.5.0_10.

3 comments :

fabrizio said...

Hi Surya

I don't understand why U want to make the life so complicated :)
Anyway joke apart I hope to have understand what U mean:

the result that U have expected can't be because readResolve and resolveObject are invoked ONLY when an object has been read by

objectInputStream, so the order is:

readObject
readResolve
resolveObject

here why U get "CustomObjectInputStream.resolveObject [Class C: Test***String]"
after "ClassC.readObject"

now U expect that ClassB.readResolve and then ClassA.rearResolve hat to be invoked when you create ClassB and ClassA
but that is not enough because the method are invoked only after the stream have read the object

if U change the code in CustomObjectInputStream:
.
.
.
if (obj instanceof ClassC) {
return new ClassB(((ClassC) obj).getClassCData());
.
.
.

with:

.
.
.
if (obj instanceof ClassC) {
return new ClassA(((ClassC) obj).getClassCData());
.
.
.

U have the result expected.
If instead U had to pass through ClassB then actually will be a bit complicated ...
tell me what U think

cheers
Fabrizio

fabrizio said...

Hi Surya

I don't understand why U want to make the life so complicated :)
Anyway joke apart I hope to have understand what U mean:

the result that U have expected can't be because readResolve and resolveObject are invoked ONLY when an object has been read by

objectInputStream, so the order is:

readObject
readResolve
resolveObject

here why U get "CustomObjectInputStream.resolveObject [Class C: Test***String]"
after "ClassC.readObject"

now U expect that ClassB.readResolve and then ClassA.rearResolve hat to be invoked when you create ClassB and ClassA
but that is not enough because the method are invoked only after the stream have read the object

if U change the code in CustomObjectInputStream:
.
.
.
if (obj instanceof ClassC) {
return new ClassB(((ClassC) obj).getClassCData());
.
.
.

with:

.
.
.
if (obj instanceof ClassC) {
return new ClassA(((ClassC) obj).getClassCData());
.
.
.

U have the result expected.
If instead U had to pass through ClassB then actually will be a bit complicated ...

cheers
Fabrizio

tegbir said...

Repeat step 2 if the returned Object has a writeReplace(), until an object returned doesn't override writeReplace(). This is not true as ClassB also have writeReplace which returns object of type ClassB, this will put it in infinite loop.
so it is
Repeat step 2 if the returned Object has a writeReplace(), until an object returned doesn't provide writeReplace() or writeReplace() returns the object of same type on which it is called. what do you think?