Recently I was asked to investigate an OutOfMemory issue with the PermGen running out of allocated space. Sounds simple in the sense that if I open up the heap dump, check out the classes that are loaded and figure out where the issue is. However, this happened on production and there was no heap dump available. This was on a Sun Java SDK 1.6.x.
As a consequence of this exercise, I brushed up on some of my concepts of Java Garbage Collection and discovered some intricacies of the tools in the process.
Before, we go any further, I would like to elucidate with some authority gained from past experience that analyzing and resolving OutOfMemory PermGen issues requires not only a clear understanding of the Java GC concepts but an intricate knowledge of your application as well. Consequently to cover the former, please take a look at the “Java Garbage Collection” details especially the concept of generations and heap versus non heap.
PermGen allocation is separate from the heap and therefore the parameters to set that are different from the regular -Xms and -Xmx. Please take a look at the Java VM options (do a google search and check out PermSize options – one to set the minimum bound and the other for the maximum). Also, please note that PermGen holds the class metadata, class definitions and so on. Once the class is loaded then any static references from it are stored in the heap (young or old) along with any instances.
Since we are tracking the PermGen, what we need to do is to replicate the issue which is easier said than done since you probably do not have enough information as to:
- In what flow of your application did it happen? Perhaps this can be gleamed from the logs as well as the stack traces.
- What was the load on the application that led to this?
Thereafter, what you could do is to lower the PermGen allocated to your application and perform the following steps:
- Specify the “printGCDetails” along with the verboseGC options for the VM.
- Specify the “on OutOfMemory, generate heap dump” VM option. Note that sometimes even with this option specified the heap dump is not generated since the VM becomes unstable when a OOM happens.
- Options to generate the listing of classes that are loaded and unloaded by the GC process.
I would recommend that these setting (no 1 and no 2 above) be part of any VM process even that that is running in Production environments. They are of immense use in resolving OOM issues.
Sometimes, you would also see OOM PermGen stack traces in the log but that resolve themselves and the application continues to perform and run as planned. These ephemeral OOM PermGen stack traces do not seem to impact the longevity and stability of the process but at the same time they need to be investigated and the conclusion of such an investigation can lead to an increases in the PermGen space allocation and that would be the resolution.
The first thing to do is to inspect the dump. In case the heap dump is not available then the only option is to replicate the issue.
If you are able to replicate the issue then the next step would be to inspect the heap dump and there are some very useful tools out there. The one that I found most useful was the Eclipse Memory Analyzer. You would need to review the usage and options of this tool and they are well documented. At this time, you have the following data set with you:
- List of the classes that have been loaded and unloaded.
- You could take the complement of the intersection of the loaded and unloaded i.e. all the classes that have not been unloaded.
Fire up an OQL query (select query: please see the OQL examples) and trace the graph to the entity that is responsible for retaining the class to evaluate a leak. At this juncture, the familiarity of the codebase of the application is required.
If you are not able to replicate the issue and you do not get an OOM PermGen stack trace, do not loose heart. You could also try to run your application for a period of time and watch the PermGen grow and then introduce a PermGen collection as part of the Full GC and evaluate if the allocation is reduced. Repeat the process of loading the application, introducing PermGen collection and evaluating the heap dump to investigate the classes that are not being unloaded and that could point to a possible leak. One thing to note is that PermGen Collection is generally not part of the Full Garbage Collection unless the appropriate VM option is specified or the PermGen gets filled up so as to trigger a PermGen Collection. The latter technique is demonstrated in a link in the references section below.
You could also try figuring out if there is a class loader leak => if the application is un-deployed as in the case of an application deployed in a J2EE container as in JBoss, does the class loader responsible for the application is made available for GC? This technique is also detailed in a link in the references section below.
I have also tried to use the JHAT tool but gave up on that in preference to the Eclipse Memory Profiler since the JHAT tool required a lot of memory and was slow. I believe this has been patched in a later version of JDK 1.6 and you could try that as well.