Friday, August 19, 2011

Assorted facts about JBoss. Fact 2: classloading. Broken by default.

The classloading situation in JBoss is a mess. You know, evolutionary kind of mess. They began with something messy, then started to add more things, configuration parameters, bells and whistles ... Some old classloading problems went away, just to be replaced by new problems.

I had my share of ClassNotFoundExceptions, NoClassDefFoundErrors, and LinkageErrors before. Not all of them were caused by JBoss, but those that were caused by JBoss were the toughest to resolve.

Recently JBoss (JBoss Application Server 6) appeared again on my professional horizon. And one of the first problems I ran into was a LinkageError. Even that problem with StAX API jar happened later.

The deployment of the application fails. Sometimes. And sometimes it is OK. If it fails, the error is
java.lang.LinkageError:

loader constraint violation:
loader (instance of org/jboss/classloader/spi/base/BaseClassLoader)
previously initiated loading for a different type
with name "org/xml/sax/Attributes"
The class name is not always org.xml.sax.Attributes, but it is always a class from org.xml.sax. And if the deployment is OK, the same error happens later, at runtime.

Nothing new. This error screams "Duplicate class". So I looked around and found jtidy-4aug2000r7-dev.jar packaged in EAR/lib (and only there) which has its own copy of org.xml.sax and org.w3c.dom classes. JMX classloader bean for the application confirmed that the classes are coming from the application's EAR.

The reason why the classes are packaged into jtidy-4aug2000r7-dev.jar is not really important here. But I was really surprised (well, initially) that JBoss uses these classes instead of JDK classes. I deployed the application on the stock version of JBoss AS 6 without any modifications of classloading configuration. There was even no jboss-app.xml (I was planning to add it, just in case) let alone other classloading specific files. So I expect JBoss to have JEE compatible classloading behavior.

Initially I did not have time to investigate the problem. After notifying the project owner of the problem the decision is made: remove JTidy jar from EAR/lib and continue. JTidy is used in some really obscure piece of code that is called not that often. We can deal with this later. There is even a chance that this functionality will be rewritten to get rid of JTidy. But now we need a version of the application running in JBoss.

I did just that and went ahead. But I kept wondering. I could not understand why the presence of jtidy-4aug2000r7-dev.jar in EAR/lib causes such an error. It goes against all my knowledge, understanding of and experience with java classloading. Except that I am dealing with JBoss. But even JBoss would not do that, would it?

More importantly that problem might have a much broader effect on the project but I could not even imagine what kind of effect. More likely negative, that I was sure of.

Finally I have got some spare time. I have created a very simple application with a single EJB module with one SessionBean and jtidy-4aug2000r7-dev.jar as a dependency, packaged as an EAR application. After deploying it into JBoss I went into JMX classloader bean and verified that org.xml.sax classes are coming from the test application. The SessionBean has a business method that does Class.forName() and then returns getProtectionDomain().getCodeSource() of the loaded class. I created a simple EJB client application that calls the business method with different class names. All the org.xml.sax and org.w3c.dom classes present in jtidy-4aug2000r7-dev.jar were indeed coming from this jar file.

Next I have created another jar file manually packaging some classes from various places and packages like javax.xml.bind, org.dom4j, org.hibernate, java.text, javax.management, etc. I replaced jtidy jar with this new jar file and repeated the test. Only with java.* classes the business method was returning null which means that only those classes were coming from the primordial class loader. All other classes present in the jar were loaded from it. Well, well, well, JBoss at it again.

Trying to find anything specific about the problem on the net did not help. Classloading problems in JBoss is a really hot topic after all!

If nothing else helps ... Use the Source Luke!

Actually even before going deep I noticed one interesting thing in JBoss JMX Console. It appears that my demo application has a classloader domain JMX even if I did not have JBoss specific deployment descriptors. I clicked on it and again one thing stood out just screaming "Look at me":
ParentPolicyName, MBean Attribute, AFTER_BUT_JAVA_BEFORE.

"After but java before". Sounds familiar. pseudoTransactionEnlistment anyone? Or maybe BOZOSLIVEHERE?

The rest was easy. The biggest problem was to get the right source files quickly. You can't nowadays just download a fat zip or gz file with all the sources in it. After that was done, a bit of grepping and the like reveals the truth:

Class org.jboss.classloader.spi.ParentPolicy with some predefined instances like BEFORE, AFTER, BEFORE_BUT_JAVA_ONLY, etc. The comments around these predefined instances explained the meaning. In my case it was AFTER_BUT_JAVA_BEFORE. Except that the source file claims that AFTER_BUT_JAVA_BEFORE means "Java and Javax classes before, everything else after" and my tests show that javax.* classes also come from the jar file in EAR/lib. A bit more looking around led me to this piece of code in class org.jboss.classloading.spi.dependency.Module:
public ParentPolicy getDeterminedParentPolicy()

{
if (isJ2seClassLoadingCompliance())
return ParentPolicy.BEFORE;
else
return ParentPolicy.AFTER_BUT_ONLY_JAVA_BEFORE;
}

Since I do not have any JBoss specific deployment descriptors isJ2seClassLoadingCompliance() returns false resulting in ParentPolicy.AFTER_BUT_ONLY_JAVA_BEFORE being used. Not AFTER_BUT_JAVA_BEFORE. The comments next to AFTER_BUT_ONLY_JAVA_BEFORE in ParentPolicy.java clearly match the observed behavior: "Java classes before, everything else after". What am I missing?

Turns out there is one more small thing: a copy-paste error in the definition of AFTER_BUT_ONLY_JAVA_BEFORE:
/** Java and Javax classes before, everything else after */

public static final ParentPolicy AFTER_BUT_JAVA_BEFORE =
new ParentPolicy(ClassFilterUtils.JAVA_ONLY,
ClassFilterUtils.EVERYTHING,
"AFTER_BUT_JAVA_BEFORE");

/** Java classes before, everything else after */
public static final ParentPolicy AFTER_BUT_ONLY_JAVA_BEFORE =
new ParentPolicy(ClassFilterUtils.NOTHING_BUT_JAVA,
ClassFilterUtils.EVERYTHING,
"AFTER_BUT_JAVA_BEFORE");
Mystery solved. I have added jboss-app.xml to the EAR with <loader-repository-config>java2ParentDelegation=true</loader-repository-config> (might just as well have added jboss-classloading.xml), redeployed the application and sure enough I have got BEFORE as ParentPolicyName in JMX Console, but more importantly I have now the expected classloading behavior. For each class present the test jar in EAR/lib both JMX Console and my SessionBean load the class not from the test jar but form some other place like JDK or jars from <jboss>/common/lib.

I have mentioned above that I was planning to add jboss-app.xml to the application anyway because I do not trust JBoss. Boy I was right. The end result for me would have been the same but I would have missed all that fun.

But ... Who in their right mind comes up with such interesting classloading logic?? What they were trying to achieve? Why the hell I have to explicitly "opt in" to get the most sensible classloading configuration? *


* Note: ideally. The current state of classloading affairs in JEE containers makes it much harder than necessary. Internal container classes and classes from various third-party jars that container is using leak into an application. This is a big deal even if the application does not have conflicts with those third-party jars. In case of conflicts all bets are off. Granted containers provide some mechanisms to fine tune classloading, but these mechanisms do not always work. Yes, JBoss, it is about you. But I do think that this "delegate to the parent first except when in WAR" is the most sensible classloading configuration and definitely the one to start with and to try to stick to as much as possible.

Monday, August 15, 2011

Assorted facts about JBoss. Fact 1: StAX (Streaming API for XML) and the meaning of -711357515002332258.

Every time I have to do some serious work with JBoss I come across a situation that requires patching JBoss. HelloWorld kinds of applications tend to work, but as soon as things get complicated there is always something...

This time it is JBoss6 and StAX API. You see, there is <jboss>/lib/endorsed directory with some files in it. Normally if you start jboss with <jboss>/bin/run[.bat] the JVM is started with -Djava.endorsed.dirs=<jboss>/lib/endorsed

No problem, it is the desired and documented behavior if one wants to have newer versions of some APIs available in JDK. But <jboss>/lib/endorsed/stax-api.jar is a bit different. It is there for the sake of JDK 1.5. As of JDK 1.6 StAX is part of JDK itself. And it is not like JBoss packages a better or newer version of StAX. So if you run JBoss on JDK 1.6, do yourself a favor: delete <jboss>/lib/endorsed/stax-api.jar right now.

The solution for JDK 1.5 is not so simple because JDK 1.5 does not provide StAX. But first, what is the problem? This is it:
java.io.InvalidClassException: javax.xml.namespace.QName;
local class incompatible: stream classdesc serialVersionUID = -9120448754896609940,
local class serialVersionUID = -711357515002332258
<jboss>/lib/endorsed/stax-api.jar contains more than just StAX classes. It contains some old versions of classes that long ago present in JDK. And because the classes are in an endorsed jar, they override standard JDK classes.

If you look into JDK source code you will see that class javax.xml.namespace.QName goes to some lengths to initialize private static final long serialVersionUID with some known good value. The version packaged in <jboss>/lib/endorsed/stax-api.jar does not define field serialVersionUID leaving you at mercy of the JVM algorithm to calculate serial version UID. Which produces -711357515002332258 in this particular case.

Bad luck if you have a serialized instance of a class which has a non-transient field of type javax.xml.namespace.QName. Or if you have QName as a parameter in one of your remote interfaces.

So I fixed the problem by removing stax-api.jar since I am running under JDK 1.6 and went ahead.

The simplest solution for JDK 1.5 is probably deletion of everything that is not under javax.xml.stream from <jboss>/lib/endorsed/stax-api.jar. There are also stax-api jars around that include only javax.xml.stream.* classes.

But still ... One thing bothered me. This is quite an easy mistake to make especially if this file was added to the endorsed dir some time ago. I can see that it is present in JBoss5; I did not check earlier versions. But come on is it that difficult to review these things for every major release?

I can't be the first one to hit this problem. A bit of googling brought me here (JBPAPP-4223). OK, a bug is reported but I guess nobody is going to do a thing about it. After all it was reported on the 5th of May 2010, JBoss 6.0.0.Final was released half a year later, still with the problem.

And then I found this little gem. The beauty here is the recommendation of the JBoss EJB3 Lead Developer. Just read it. He seriously proposes to add the broken stax-api.jar to the client endorsed jar set. WTF?! JBoss EJB3 Lead Developer? No kidding?

Am I really surprised? Not at all.

Thursday, August 11, 2011

Mule, HTTP and transaction management

Mule has support for transactions, see here. So if the inbound and outbound endpoints are transactional, like JDBC or JMS, it is easy to make sure the messages are handled transactional.

It is not so easy if an endpoint is not transactional. For example we have a configuration with a jms:inbound-endpoint and an http:outbound-endpoint. A message is retrieved from the queue and sent via HTTP to some receiver. Of course the message must be removed from the queue only if it is successfully received (or handled) by the receiver.

The inbound-endpoint configuration is easy:
    <inbound>

<jms:inbound-endpoint queue="${queue_name}" connector-ref="jmsConnector">
<jms:transaction action="ALWAYS_BEGIN"/>
</jms:inbound-endpoint>
</inbound>

ALWAYS_BEGIN ensures that a new transaction is started and a message is received in this transaction.

This leaves the outbound-endpoint. Just saying
        <http:outbound-endpoint address="${http_address}"/>

is not enough because it automatically means action="NONE". Mule throws an exception complaining that the outbound endpoint cannot join the active transaction because it is configured with action="NONE". Fair enough, let's change this into
        <http:outbound-endpoint address="${http_address}">

<http:transaction action="JOIN_IF_POSSIBLE"/>
</http:outbound-endpoint>

But this does not work because "http:transaction" is not recognized as a valid element. Ooops. This is logical, HTTP is not transactional per definition. But we really need JMS to be transactional.

The solution?
        <http:outbound-endpoint address="${http_address}">

<jms:transaction action="JOIN_IF_POSSIBLE"/>
</http:outbound-endpoint>

Mule is happy with this and it does the right thing. If there is a problem connecting to the target or sending the message to it HTTP endpoint makes sure that the message gets "exception payload" set. This triggers Mule transaction support to rollback the active transaction.

This is not a generic solution, but it suits us: no messages are lost; the message is back in the queue and is redelivered later. The only problem with this approach is that the transaction can be rolled back after the message was successfully received by the HTTP receiver. This results in redelivery of the same message to the HTTP receiver which must be able to handle this.

But this is not a big deal in our case. It is so happens that most of the time the message ends up in a dispatcher that looks in its registry for subscribers. The first successful delivery of the message caused subscribers to unsubscribe so the dispatcher just silently drops the message.

In some other cases the message is just notification of some kind so nobody really cares if the same notification appears twice.

The only case when this might cause some trouble in our system is when such a message results in a creation of a BPEL process instance. Most of the time the newly created instance fails with "conflicting receive" error because the instance created after the first message delivery is still running and the process has some <onMessage> with a correlation set. But these cases are easily recognized by the administrators. And I must say if one is using BPEL then "conflicting receive" is the least of one's worry.