[MAGNOLIA-3390] Prevent OOME and GC load during activation of large data sets Created: 16/Nov/10 Updated: 19/Dec/16 Resolved: 01/Dec/10 |
|
| Status: | Closed |
| Project: | Magnolia |
| Component/s: | activation |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.9, 4.4 |
| Type: | Improvement | Priority: | Blocker |
| Reporter: | Joerg von Frantzius | Assignee: | Philipp Bärfuss |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Template: |
|
| Acceptance criteria: |
Empty
|
| Task DoD: |
[ ]*
Doc/release notes changes? Comment present?
[ ]*
Downstream builds green?
[ ]*
Solution information and context easily available?
[ ]*
Tests
[ ]*
FixVersion filled and not yet released
[ ] 
Architecture Decision Record (ADR)
|
| Date of First Response: |
| Description |
|
During activation, the activated data is currently held completely in memory for each activation request sent from author to public. The problemWhen e.g. 250MB are activated on an author instance with 4 subscribers, these 250MB are allocated 4 times in a row in RAM and garbage-collected afterwards. Even if no OutOfMemoryError occurs during this, a high load is put on the Garbage Collector, likely forcing the VM to perform "stop the world" full collections, leading to unresponsiveness of the author instance for editors. Given large enough binary data or simultaneous attempts at activating it, any maximum heap size can be exceeded. Current implementationThis seems to be due to the default behaviour of java.net.URLConnection.getOutputStream() used by info.magnolia.module.exchangesimple.Transporter, which returns a subclass of ByteArrayOutputStream that caches the whole GET request in memory. This probably happens in order to determine the content-length before actually sending the request. Proposed solutionThe solution is to use "chunked transfer coding", as defined in RFC 2616. This needs to be explicitly enabled by calling java.net.HttpURLConnection.setChunkedStreamingMode(int) prior to getOutputStream(). I verified via debugger that doing so will result in a sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream extends FilterOutputStream instead of sun.net.www.http.PosterOutputStream extends ByteArrayOutputStream. Chunking requires the public server to be HTTP/1.1 compliant. In case HTTP/1.1 compliance poses a problem e.g. with proxied public servers or weird HTTP servers, chunking of activation requests should be configurable. There could e.g. be a configuration NodeData "server/activation/subscribers/<subscribername>/useRequestChunking" with default value "true". |
| Comments |
| Comment by Philipp Bärfuss [ 18/Nov/10 ] |
|
Thanks for reporting and outlining this issue. We are going to change that for 4.4 and we will most likely backport it to 4.3.x. |
| Comment by Jan Haderka [ 01/Dec/10 ] |
|
Please correct me if I'm wrong. |
| Comment by Philipp Bärfuss [ 01/Dec/10 ] |
|
There is no need to use the chunked method as long we can use setFixedLengthStreamingMode() which we do now. In that case the content is not buffered (see javadoc). Unfortunately this method does only take an int (not long) value that is why I start to use the chunked mode if the content to be sent is bigger than 2G. As it is very likely that you run into memory issues with content bigger than 2G I think it is alright not to make it configurable and simply let it fail. |
| Comment by Jan Haderka [ 01/Dec/10 ] |
|
Fair enough then. |