Magnolia

Potential memory leak: investigation.

Details

  • Type: Task Task
  • Status: Resolved Resolved
  • Priority: Critical Critical
  • Resolution: Fixed
  • Affects Version/s: None
  • Fix Version/s: 3.5.5, 3.6
  • Component/s: None
  • Labels:
  • Description:
    Hide

    Several users have reported memory issues. We're creating this issue to collect information, reports and other evidence. Please attach relevant files and leave comments on your experiences. Thanks!

    Current status:

    • We're investigating about this issue, but we're lacking evidence of a real leak in Magnolia 3.5 when used with an external database at the moment. (see comments below)
    • Magnolia 3.0 must not be used with jackrabbit 1.3 but with the delivered 1.0 version
    • There have been too many architectural changes in 3.5 to backport the fixes to the 3.0 branch
    • The following definitely helps:
      • Using an external database such as MySQL
      • Using Magnolia 3.5.x (and Jackrabbit 1.3.x)
    • If you're experiencing OutOfMemoryError: PermGen space, you need to increase the -XX:MaxPermSize JVM setting.
    • If you're still having memory issues after applying the above advices, please report them here with the following information:
      • Operating system:
      • Java version:
      • Container (tomcat, jetty, ...) and version:
      • Precise JVM settings (JAVA_OPTS, CATALINA_OPTS, ...)
      • How is your container started:
      • Is there any specific operation that triggers your memory issues?
      • Any stacktraces or relevant log files (in attachment, with your name in the filename, please)
    Show
    Several users have reported memory issues. We're creating this issue to collect information, reports and other evidence. Please attach relevant files and leave comments on your experiences. Thanks! Current status:
    • We're investigating about this issue, but we're lacking evidence of a real leak in Magnolia 3.5 when used with an external database at the moment. (see comments below)
    • Magnolia 3.0 must not be used with jackrabbit 1.3 but with the delivered 1.0 version
    • There have been too many architectural changes in 3.5 to backport the fixes to the 3.0 branch
    • The following definitely helps:
      • Using an external database such as MySQL
      • Using Magnolia 3.5.x (and Jackrabbit 1.3.x)
    • If you're experiencing OutOfMemoryError: PermGen space, you need to increase the -XX:MaxPermSize JVM setting.
    • If you're still having memory issues after applying the above advices, please report them here with the following information:
      • Operating system:
      • Java version:
      • Container (tomcat, jetty, ...) and version:
      • Precise JVM settings (JAVA_OPTS, CATALINA_OPTS, ...)
      • How is your container started:
      • Is there any specific operation that triggers your memory issues?
      • Any stacktraces or relevant log files (in attachment, with your name in the filename, please)

Issue Links

Activity

Hide
Yuanhua Qu added a comment - 11/Jan/08 6:44 PM

I recorded the mem leak behavious for 3.0.5 and 3.5. Heap still have a growing trend and will eventually cause problem. And the record is only under small testing environment with pretty small load. The mem leak crushed our production server which have hundreds of sites and high load and already given big heap size for JVM. Your fast fix on this issue will surely save us from a catastrophe. Looking forward to your great news on this.

Show
Yuanhua Qu added a comment - 11/Jan/08 6:44 PM I recorded the mem leak behavious for 3.0.5 and 3.5. Heap still have a growing trend and will eventually cause problem. And the record is only under small testing environment with pretty small load. The mem leak crushed our production server which have hundreds of sites and high load and already given big heap size for JVM. Your fast fix on this issue will surely save us from a catastrophe. Looking forward to your great news on this.
Hide
Grégory Joseph added a comment - 11/Jan/08 7:53 PM

Could you please give us more details about your setup ? java version, JAVA_OPTS and CATALINA_OPTS, the PersistenceManager you use in jackrabbit are 3 informations we need to be able to give perspective to your document.
Thanks for the detailed report !

Show
Grégory Joseph added a comment - 11/Jan/08 7:53 PM Could you please give us more details about your setup ? java version, JAVA_OPTS and CATALINA_OPTS, the PersistenceManager you use in jackrabbit are 3 informations we need to be able to give perspective to your document. Thanks for the detailed report !
Hide
Grégory Joseph added a comment - 11/Jan/08 8:05 PM

What I can probably already tell you from this, is that the graphs for 3.5 don't show a real flagrant evidence of a leak. At first sight, there are only minor GC hits. (we can see a major one on page 2 at about 35 minutes, but that's Magnolia 3.0)
Reducing the Xmx parameter would help the test, by forcing more major GC hits. After a handful of such hits, the graph will hopefully (rather not ) show evidence of a leak. Until then, seems like your VM just uses what it's been given.

Show
Grégory Joseph added a comment - 11/Jan/08 8:05 PM What I can probably already tell you from this, is that the graphs for 3.5 don't show a real flagrant evidence of a leak. At first sight, there are only minor GC hits. (we can see a major one on page 2 at about 35 minutes, but that's Magnolia 3.0) Reducing the Xmx parameter would help the test, by forcing more major GC hits. After a handful of such hits, the graph will hopefully (rather not ) show evidence of a leak. Until then, seems like your VM just uses what it's been given.
Hide
Mike Jones added a comment - 11/Jan/08 8:07 PM

We've had the exact same issue:

RHEL, JDK 1.6, JAVA_OPTS: -Xmx2048m -Xms512m

Happens with both Magnolia 3.0.5 > BDB & Magnolia 3.0 > Derby

Show
Mike Jones added a comment - 11/Jan/08 8:07 PM We've had the exact same issue: RHEL, JDK 1.6, JAVA_OPTS: -Xmx2048m -Xms512m Happens with both Magnolia 3.0.5 > BDB & Magnolia 3.0 > Derby
Hide
Yuanhua Qu added a comment - 11/Jan/08 8:37 PM

My testing env setup:

tomcat 5.0.28
JAVA_OPTS=-server -Xms512m -Xmx512m
<PersistenceManager class="info.magnolia.state.berkeley.BerkeleyDBPersistenceManager" />
Here are jars we updated and added for 3.0.5:
derby-10.2.2.0.jar
jackrabbit-api-1.3.3.jar
jackrabbit-core-1.3.3.jar
jackrabbit-jcr-commons-1.3.3.jar
jackrabbit-text-extractors-1.3.3.jar
lucene-core-2.0.0.jar
magnolia-bdb-1.2.jar
PooledJNDIDatabasePersistenceManager.jar

For 3.5, I also use magnolia-bdb-1.2.jar same as above.

As for the graphs in the doc, every first graph for each mangolia version is just recording the mem at start up till after about 15 minutes with no hit at all. Page 1 for 3.0.5 and page 5 for 3.5. And you can see that mem usage was about 100M for 3.0.5 and 67M for 3.5;

Page 2 shows mem leak trend for version 3.0.5 when consistant hit starts after 15 minutes of startup till about 50 minutes after startup.
Page 6 shows mem leak trend for version 3.5 when consistant hit starts after about 15 min of startup till about 90min after startup, heap climb up to 210M. The leak certainly grows much slower than version 3.0.5, but still grows. It gets 260M after 100min

I hereby attach a screen shot Mag3.5Jackrabbit1.3.3_memGraph.JPG for mem heap usage when I tested v3.5

Our production is using 3.0.5 so the situation is really bad.

Show
Yuanhua Qu added a comment - 11/Jan/08 8:37 PM My testing env setup: tomcat 5.0.28 JAVA_OPTS=-server -Xms512m -Xmx512m <PersistenceManager class="info.magnolia.state.berkeley.BerkeleyDBPersistenceManager" /> Here are jars we updated and added for 3.0.5: derby-10.2.2.0.jar jackrabbit-api-1.3.3.jar jackrabbit-core-1.3.3.jar jackrabbit-jcr-commons-1.3.3.jar jackrabbit-text-extractors-1.3.3.jar lucene-core-2.0.0.jar magnolia-bdb-1.2.jar PooledJNDIDatabasePersistenceManager.jar For 3.5, I also use magnolia-bdb-1.2.jar same as above. As for the graphs in the doc, every first graph for each mangolia version is just recording the mem at start up till after about 15 minutes with no hit at all. Page 1 for 3.0.5 and page 5 for 3.5. And you can see that mem usage was about 100M for 3.0.5 and 67M for 3.5; Page 2 shows mem leak trend for version 3.0.5 when consistant hit starts after 15 minutes of startup till about 50 minutes after startup. Page 6 shows mem leak trend for version 3.5 when consistant hit starts after about 15 min of startup till about 90min after startup, heap climb up to 210M. The leak certainly grows much slower than version 3.0.5, but still grows. It gets 260M after 100min I hereby attach a screen shot Mag3.5Jackrabbit1.3.3_memGraph.JPG for mem heap usage when I tested v3.5 Our production is using 3.0.5 so the situation is really bad.
Hide
Yuanhua Qu added a comment - 11/Jan/08 8:39 PM

my test use JDK1.6.0_03

Show
Yuanhua Qu added a comment - 11/Jan/08 8:39 PM my test use JDK1.6.0_03
Hide
Sean McMains added a comment - 11/Jan/08 8:43 PM

Qu should be providing the salient details on our environment.

A few particularly interesting details that our experiments have turned up:

  • Due to our caching architecture, visitor's cookies were getting stripped out, which caused Tomcat to generate a new session for each page load. This wasn't a problem with the old Jackrabbit. Modifying our caching architecture so that sessions are preserved seemed to help slow the rate of memory use considerably.
  • Qu's experiments this morning showed that with Magnolia 3.5, the issue was far less pronounced. As Gregory pointed out, there may not even be a leak there. MAGNOLIA-623 seems to indicate that session use on public was dropped in v3.5, so upgrading to v3.5 may be just another way to address the same sessions issue.
Show
Sean McMains added a comment - 11/Jan/08 8:43 PM Qu should be providing the salient details on our environment. A few particularly interesting details that our experiments have turned up:
  • Due to our caching architecture, visitor's cookies were getting stripped out, which caused Tomcat to generate a new session for each page load. This wasn't a problem with the old Jackrabbit. Modifying our caching architecture so that sessions are preserved seemed to help slow the rate of memory use considerably.
  • Qu's experiments this morning showed that with Magnolia 3.5, the issue was far less pronounced. As Gregory pointed out, there may not even be a leak there. MAGNOLIA-623 seems to indicate that session use on public was dropped in v3.5, so upgrading to v3.5 may be just another way to address the same sessions issue.
Hide
Grégory Joseph added a comment - 11/Jan/08 8:59 PM - edited

Http session handling and usage was definitely changed in 3.5, yes.

Qu, it would be really helpful if you could decrease Xms and Xmx(even as low as 128 and 256m); hopefully that would show more "major GC" hits in the graphs.

Show
Grégory Joseph added a comment - 11/Jan/08 8:59 PM - edited Http session handling and usage was definitely changed in 3.5, yes. Qu, it would be really helpful if you could decrease Xms and Xmx(even as low as 128 and 256m); hopefully that would show more "major GC" hits in the graphs.
Hide
Yuanhua Qu added a comment - 11/Jan/08 9:26 PM

No idea how my env setup msg got another 3 copies 30 min after I first submitted. Weird.

I'll take Gregory's advice and set the heap low to test and for a longer period. Will update the result when it's done.

Show
Yuanhua Qu added a comment - 11/Jan/08 9:26 PM No idea how my env setup msg got another 3 copies 30 min after I first submitted. Weird. I'll take Gregory's advice and set the heap low to test and for a longer period. Will update the result when it's done.
Hide
Grégory Joseph added a comment - 11/Jan/08 9:33 PM

Hmm, this probably happen as I've just been restarting the server a couple times just now, fixing a couple quirks since the migration
I'll delete the extra comments. Sorry for that.

Show
Grégory Joseph added a comment - 11/Jan/08 9:33 PM Hmm, this probably happen as I've just been restarting the server a couple times just now, fixing a couple quirks since the migration I'll delete the extra comments. Sorry for that.
Hide
Sean McMains added a comment - 14/Jan/08 4:49 PM

I saw your latest updates on the status, Gregory. I wanted to point out that, among your recommendations, it was moving to Jackrabbit 1.3.3 that caused the problem in our case.

I understand that you guys don't have the resources to backport the fixes to the 3.0 branch. However, if you're not going to do that, I would recommend warning people that upgrading to Jackrabbit 1.3.3 may not be a good idea until they're ready to move to Magnolia 3.5.

Show
Sean McMains added a comment - 14/Jan/08 4:49 PM I saw your latest updates on the status, Gregory. I wanted to point out that, among your recommendations, it was moving to Jackrabbit 1.3.3 that caused the problem in our case. I understand that you guys don't have the resources to backport the fixes to the 3.0 branch. However, if you're not going to do that, I would recommend warning people that upgrading to Jackrabbit 1.3.3 may not be a good idea until they're ready to move to Magnolia 3.5.
Hide
Grégory Joseph added a comment - 14/Jan/08 5:43 PM

Hmm, I didn't realize this. So you're saying that with Magnolia 3.0.x and Jackrabbit 1.0.x, memory usage was as stable as with Magnolia 3.5.x and JR1.3.x ?

Show
Grégory Joseph added a comment - 14/Jan/08 5:43 PM Hmm, I didn't realize this. So you're saying that with Magnolia 3.0.x and Jackrabbit 1.0.x, memory usage was as stable as with Magnolia 3.5.x and JR1.3.x ?
Hide
Yuanhua Qu added a comment - 14/Jan/08 9:19 PM

I set the JVM max size to be 128M and it did show the mem usuage stable after each fgc hit. Here is the graph I got in the attachment Mag3.5Jackrabbit1.3.3_memGraph_128M.JPG . This is much better than magnolia 3.0.5 with jackrabbit1.3.3 which shows mem climbing after each fgc. My concern for magnolia 3.5 with jackrabbit 1.3.3 is what is the influence to the application's performance if mem only get stable after each fgc happens . From the graphic, we can see that heap still grows fast after each minor collection. Looks like lots of objects get longer life time and push to Old space will will trigger fgc when old space is full.

Show
Yuanhua Qu added a comment - 14/Jan/08 9:19 PM I set the JVM max size to be 128M and it did show the mem usuage stable after each fgc hit. Here is the graph I got in the attachment Mag3.5Jackrabbit1.3.3_memGraph_128M.JPG . This is much better than magnolia 3.0.5 with jackrabbit1.3.3 which shows mem climbing after each fgc. My concern for magnolia 3.5 with jackrabbit 1.3.3 is what is the influence to the application's performance if mem only get stable after each fgc happens . From the graphic, we can see that heap still grows fast after each minor collection. Looks like lots of objects get longer life time and push to Old space will will trigger fgc when old space is full.
Hide
Sean McMains added a comment - 14/Jan/08 9:24 PM

We haven't profiled the memory usage as closely with Magnolia 3.0.x and Jackrabbit 1.0.x, but that certainly seemed to be the case, yes.

Show
Sean McMains added a comment - 14/Jan/08 9:24 PM We haven't profiled the memory usage as closely with Magnolia 3.0.x and Jackrabbit 1.0.x, but that certainly seemed to be the case, yes.
Hide
Grégory Joseph added a comment - 15/Jan/08 2:08 PM

Sean : I edited the status to reflect your comment; it's still unclear as to why only upgrading Jackrabbit would make the issue worse, though.

Qu : the purpose of this last test was to see how the GC behaved with less memory, i.e. trying and detect a potential leak. Resetting your Xmx to a more viable limit will produce less scary graphs

Another question now: could you provide some details as to how this was tested? Is this real or generated traffic? Author or public instance? It seems to match our internal tests on a public instance, browsing with the anonymous user, so that's a rather good sign on that side of things.

Show
Grégory Joseph added a comment - 15/Jan/08 2:08 PM Sean : I edited the status to reflect your comment; it's still unclear as to why only upgrading Jackrabbit would make the issue worse, though. Qu : the purpose of this last test was to see how the GC behaved with less memory, i.e. trying and detect a potential leak. Resetting your Xmx to a more viable limit will produce less scary graphs Another question now: could you provide some details as to how this was tested? Is this real or generated traffic? Author or public instance? It seems to match our internal tests on a public instance, browsing with the anonymous user, so that's a rather good sign on that side of things.
Hide
Sean McMains added a comment - 15/Jan/08 3:23 PM

Gregory, all of our testing was done on public – the edit stage doesn't appear to display the problem, or does so at a much less dramatic rate. (Given that it seems to be tied to not tracking sessions, this makes some sense.)

For our stats, I think Qu used JMeter to generate artificial traffic. But the problem first appeared in our production instance with real traffic. Qu, feel free to jump in with any details I've missed.

Show
Sean McMains added a comment - 15/Jan/08 3:23 PM Gregory, all of our testing was done on public – the edit stage doesn't appear to display the problem, or does so at a much less dramatic rate. (Given that it seems to be tied to not tracking sessions, this makes some sense.) For our stats, I think Qu used JMeter to generate artificial traffic. But the problem first appeared in our production instance with real traffic. Qu, feel free to jump in with any details I've missed.
Hide
Yuanhua Qu added a comment - 15/Jan/08 4:35 PM

Yes. As Sean described, test was done by Jmeter (without enabling cookie for all http requests) ) on public instance browsing by anonymous user as your guess. It shows the similar mem usage when testing 3.0.5 with JR1.3.3 when enable persistent cookie for all http resquests.

Show
Yuanhua Qu added a comment - 15/Jan/08 4:35 PM Yes. As Sean described, test was done by Jmeter (without enabling cookie for all http requests) ) on public instance browsing by anonymous user as your guess. It shows the similar mem usage when testing 3.0.5 with JR1.3.3 when enable persistent cookie for all http resquests.
Hide
Ryan Gardner added a comment - 08/Feb/08 7:33 AM

Jackrabbit 1.4 was released recently, and perhaps has an influence on this issue? I was able to update the version in my parent-pom and rebuild the project with no problems... (Although I suspect dropping in the jar files would work just as well)

Jackrabbit 1.4's release notes indicate 220 bugfixes, and at a quick glance a few of them seemed like they might fix memory issues.

In any case, it would give another graph to help complete the set

Show
Ryan Gardner added a comment - 08/Feb/08 7:33 AM Jackrabbit 1.4 was released recently, and perhaps has an influence on this issue? I was able to update the version in my parent-pom and rebuild the project with no problems... (Although I suspect dropping in the jar files would work just as well) Jackrabbit 1.4's release notes indicate 220 bugfixes, and at a quick glance a few of them seemed like they might fix memory issues. In any case, it would give another graph to help complete the set
Hide
Todd Farrell added a comment - 10/Mar/08 3:54 AM - edited

See attached image:
memory-leak_magnolia-CE-3.0.5_jackrabbit-1.3.1.png

We have consistently reproduced an OutOfMemory condition (memory usage pattern similar to attached image) using the Magnolia 3.0.x series.

  • Memory leak ONLY occured with the CACHE DISABLED.
    • Cache is bypassed for any request that has request parameters, including the search feature provided with the samples.
  • We tried all possible combinations of the officially distributed Magnolia CE 3.0.2 / 3.0.3 / 3.0.5 with Jackrabbit 1.0.1 / 1.3.1.
  • We tried both the Derby persistence manager and the MsSqlPersistenceManager --> SQL Server 2005
  • We also tried Magnolia EE 3.0.5 with Derby (only)

Unfortunately, we haven't had the opportunity to test against the Magnolia 3.5.x series yet.

Our production setup:

  • Windows Server 2003 R2
  • Sun JDK 1.5.0 update 14
  • JBoss 4.2.2
  • Magnolia CE 3.0.5 with Jackrabbit 1.3.1
  • SQL Server 2005 (and Derby for local development environments - memory leak reproducible either way)
  • We have some custom filters etc. to integrate our application with Magnolia, but we used a standard Magnolia install as a control when load testing.
Show
Todd Farrell added a comment - 10/Mar/08 3:54 AM - edited See attached image: memory-leak_magnolia-CE-3.0.5_jackrabbit-1.3.1.png We have consistently reproduced an OutOfMemory condition (memory usage pattern similar to attached image) using the Magnolia 3.0.x series.
  • Memory leak ONLY occured with the CACHE DISABLED.
    • Cache is bypassed for any request that has request parameters, including the search feature provided with the samples.
  • We tried all possible combinations of the officially distributed Magnolia CE 3.0.2 / 3.0.3 / 3.0.5 with Jackrabbit 1.0.1 / 1.3.1.
  • We tried both the Derby persistence manager and the MsSqlPersistenceManager --> SQL Server 2005
  • We also tried Magnolia EE 3.0.5 with Derby (only)
Unfortunately, we haven't had the opportunity to test against the Magnolia 3.5.x series yet. Our production setup:
  • Windows Server 2003 R2
  • Sun JDK 1.5.0 update 14
  • JBoss 4.2.2
  • Magnolia CE 3.0.5 with Jackrabbit 1.3.1
  • SQL Server 2005 (and Derby for local development environments - memory leak reproducible either way)
  • We have some custom filters etc. to integrate our application with Magnolia, but we used a standard Magnolia install as a control when load testing.
Hide
sebastian.frick added a comment - 14/Mar/08 12:04 PM

has anyone drawn comparisons ee3.5.x/jr1.3.3 <> ee3.5.x/jr1.4?

Show
sebastian.frick added a comment - 14/Mar/08 12:04 PM has anyone drawn comparisons ee3.5.x/jr1.3.3 <> ee3.5.x/jr1.4?
Hide
Philipp Bracher [old account - now Philipp Bärfuss] added a comment - 08/May/08 2:31 PM - edited

OK I wrote a jmeter test plan which does a heavy authoring (three threads):

  • create pages
  • add 10 paragraphs
  • activate every now and then
  • activation uses versioning (but no workflow)

The setup I used was:

  • tomcat 5.5 (default magnolia bundle)
  • external db (h2)
  • -Xmx256M

I add the graph of the tenured gen space which shows:

  • it increases (slowly but steady)
  • after stopping jmeter, the memory is not freed

As a next step I will test 3.6 (to see if the latest changes have an impact on that)

Show
Philipp Bracher [old account - now Philipp Bärfuss] added a comment - 08/May/08 2:31 PM - edited OK I wrote a jmeter test plan which does a heavy authoring (three threads):
  • create pages
  • add 10 paragraphs
  • activate every now and then
  • activation uses versioning (but no workflow)
The setup I used was:
  • tomcat 5.5 (default magnolia bundle)
  • external db (h2)
  • -Xmx256M
I add the graph of the tenured gen space which shows:
  • it increases (slowly but steady)
  • after stopping jmeter, the memory is not freed
As a next step I will test 3.6 (to see if the latest changes have an impact on that)
Hide
Philipp Bracher [old account - now Philipp Bärfuss] added a comment - 08/May/08 5:22 PM

Same test on a 3.6 looks quite nice.

Note that the maximum memory usage was about 80MB (author & public in same VM!). After test exectuion the Memory was reduced to 36MB.

Note that the throughput was much better (up to 8 times faster)

Show
Philipp Bracher [old account - now Philipp Bärfuss] added a comment - 08/May/08 5:22 PM Same test on a 3.6 looks quite nice. Note that the maximum memory usage was about 80MB (author & public in same VM!). After test exectuion the Memory was reduced to 36MB. Note that the throughput was much better (up to 8 times faster)
Hide
Philipp Bracher [old account - now Philipp Bärfuss] added a comment - 09/May/08 1:42 PM

MAGNOLIA-2099 did the trick, a backport to 3.5 was possible

Show
Philipp Bracher [old account - now Philipp Bärfuss] added a comment - 09/May/08 1:42 PM MAGNOLIA-2099 did the trick, a backport to 3.5 was possible
Hide
Mike Jones added a comment - 14/May/08 4:12 PM

Hi,

Is this patch available for public use? We can download the patch file, but honestly we aren't quite sure how to apply it.
Do you have a compiled version/update for 3.5 (3.5.5?) that we can install to test?

Thx

Show
Mike Jones added a comment - 14/May/08 4:12 PM Hi, Is this patch available for public use? We can download the patch file, but honestly we aren't quite sure how to apply it. Do you have a compiled version/update for 3.5 (3.5.5?) that we can install to test? Thx
Hide
Grégory Joseph added a comment - 24/Jul/08 2:20 PM

Mike - sorry for the late reply, but : 3.5.8 has been released a while ago now - and 3.6 is on the verge of being released, too.

Show
Grégory Joseph added a comment - 24/Jul/08 2:20 PM Mike - sorry for the late reply, but : 3.5.8 has been released a while ago now - and 3.6 is on the verge of being released, too.

People

Dates

  • Created:
    11/Jan/08 6:05 PM
    Updated:
    17/Mar/09 7:15 PM
    Resolved:
    09/May/08 1:42 PM