[MGNLEE-207] Random encoding problems with Apache and GZIP filter Created: 23/Jun/11  Updated: 17/Dec/12  Resolved: 17/Dec/12

Status: Closed
Project: Magnolia DX Core
Component/s: None
Affects Version/s: 4.4.3
Fix Version/s: 4.5.7

Type: Bug Priority: Major
Reporter: Leo Lozes Assignee: Jan Haderka
Resolution: Duplicate Votes: 0
Labels: apache, encoding, filter, gzip
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

System: Amazon EC2
CPU: Xeon X5550 @ 2.67GHz w/ 8MB cache (almost idle, without other competing applications).
RAM: 18GB (only 3GB for Magnolia).
Disk: EBS volumes (all with 5GB to 20GB available)

OS: Oracle Enterprise Linux 5.3 x86_64, with EPEL and CentOS Yum repositories
Java: Oracle / Sun JDK 1.6.0_24 64-bits
Tomcat: version 6.0.32, ports 18080 (http) and 18009 (ajp)
Apache: version 2.2.3, port 80
AJP: mod_proxy_ajp


Attachments: PNG File apache-gzip-magnolia-gzip.png     PNG File apache-nogzip-magnolia-gzip.png     File headers and content screenshots.rar    
Issue Links:
causality
is causing MAGNOLIA-4591 GZip filter adds strange binary chara... Closed
duplicate
duplicates MAGNOLIA-3821 magnolia gzip compressed cache someti... Closed
relation
is related to MAGNOLIA-4413 GZip filter breaks cached forms in ch... Closed
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[ ]* Steps to reproduce, expected, and actual results filled
[ ]* Affected version filled
Date of First Response:

 Description   

We are having some random scrambled resources / pages in our testing environment.

This seems to be caused by an incompatibility between the apache / magnolia configuration with gzip.

Apache VirtualHost configuration:

<VirtualHost *:80>
ServerName magnolia-formacion.*******.com
ProxyRequests Off
<Proxy *>
Order deny,allow
Allow from all
</Proxy>
ProxyPass / ajp://localhost:18009/
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css application/x-javascript # Only when we want to enable GZIP in Apache
</VirtualHost>

GZIP configurations tested:

Direct HTTP Magnolia with gzip - OK
Direct HTTP Magnolia without gzip - OK
Apache without gzip, Magnolia without gzip - OK
Apache with gzip, Magnolia without gzip - OK
Apache with gzip, Magnolia with gzip - OK
Apache without gzip, Magnolia with gzip - WRONG

What does WRONG mean? Randomly, resources get scrambled. Sometimes the main HTML resource, sometimes one (or more) CSS resources, or maybe JavaScript. The pages appear generally broken in random, funny ways. But not always. If you reload the same page over and over again it changes almost every time. Then it looks OK. Then it's broken again. Then it's OK three times in a row. Etc.

What does SCRAMBLED mean? The resources look like random binary gibberish, but almost certainly NOT pure gzip-compressed data. With lots of UTF "unknown character" byte sequences, EF BF BD in hexadecimal (usually represented as a square standing on one vertex with a question mark in the middle).



 Comments   
Comment by Christian Ringele [ 23/Jun/11 ]

I have the same issue on Windows2003, Apache 2.1, mod_proxy_ajp not in use (just mod_jk.so & JKMount).
And only GZipped content form the GZip filter fails, GZipped content form the cache works fine.
This tomcat is in use since years for about 5 Magnolia instances. None made/make problems until a new instance based on 4.4.3 was installed.
Newest of the old instances is a 4.3.4, additionally one 4.2.X and even two 3.6.x working fine.

Cause of the problem seems to be a change between 4.3.x and 4.4.x

Comment by Jan Haderka [ 23/Jun/11 ]

Apache without gzip, Magnolia with gzip

IMHO this configuration can't work ever. Magnolia sees that client accepts gzip (as apache doesn't strip off that info from request header) and therefore encodes the response. However Apache strips of the gzip encoding info w/o actually decoding the response data, hence clients get encoded data w/o knowing that it is encoded. Some browsers tend to analyze incoming data and would figure it out and decode anyway, but others would not. Also when you reload the page and the incoming data doesn't seem correct to the browser, sometimes it chooses to display previously cached correct version of the page (I've seen this with IE and Safari).

My recommendation would be to always keep setting of the Apache and Magnolia in regard of encoding in sync. There is currently no way for Magnolia to figure out configuration of the Apache server that is in front of it automatically.

Comment by Leo Lozes [ 23/Jun/11 ]

We attached the headers of two petitions / responses, one with gzip activated in Apache, and the other one without it. As you can see, the two headers are exactly the same (except the time, and it's not photoshop ).

So we don't really understand your comment "However Apache strips of the gzip encoding info" ...

Comment by Jan Haderka [ 23/Jun/11 ]

Thx for the extra info. I would expect header being stripped off to cause the issue in this case.

Few more questions:

  • did you configure Apache to not include server header? Otherwise I would expect to see something like the snippet below in the response when going through the apache
    Server:Apache/2.2.3 (CentOS)
    
  • any reason why you are not using latest Apache but 2.2.3? (googling for "issues apache 2.2.3 gzip" reveals bunch of the issues related to not encoding or double encoding of the content depending on the specific settings of apache, alhtough i admit that w/o knowing your exact configuration it is difficult to say if there is exact match)
  • when you disabled the gzip on apache, did you also disable x-gzip?
  • could you provide headers incl first few bytes of the response from successful and unsuccessful attempt?
Comment by Christian Ringele [ 24/Jun/11 ]

I have the same behavior on a Apache 2.1 with no specific GZip encoding configuration.
A apache configuration which works fine with 4.3.x and lower Magnolia versions, but not with 4.4.x

Comment by Jan Haderka [ 24/Jun/11 ]

on a Apache 2.1

any reason why you are using even older version of Apache?

with 4.3.x and lower Magnolia versions, but not with 4.4.x

by x you mean the latest ones? 4.3.8 and 4.4.4?

Comment by Christian Ringele [ 24/Jun/11 ]

Ok I checked on the system again, and good you asked , the apache is 2.2.6 and not 2.1 (thats an old artifact not used anymore).

Apache 2.2.6 and not using mod_proxy_ajp or any specific gzip configuration.

So here is the definitive list of the used instances all served trough the same apache:
3x 3.6.3 -> all work fine
2x 4.3.1 -> all work fine
2x 4.4.3 -> error s described above (was before 4.4.2 with same problem)

Comment by Leo Lozes [ 27/Jun/11 ]

Hi Jan,

1. The full configuration of Apache is in our first message, we didn't configure it to put the Apache 2.2.3 header, it's only a "gateway" to redirect the petitions to the corresponding tomcat.
2. We use the 2.2.3 because it's the one provided by CentOS, and it's probably the most stable one (release 43 of 2.2.3 from Red Hat, tuning I suppose).
3. In Apache, you just activate or deactivate "deflate", there is no specific configuration for gzip or x-gzip (as far as we know at least).

And we'll try to get the response examples you ask

Thanks

Comment by Leo Lozes [ 27/Jun/11 ]

Here you have screenshots of both headers and content with scrambled pages and "good" pages.

Comment by Jan Haderka [ 04/Jul/11 ]

headers in the attached rar file look different then the ones you originally submitted with this issue. In the attached rar file, when content is scrambled, the response headers do not contain content-encoding (gzip), transfer-encoding header is set to chunked and Magnolia registration header is missing completely. I believe in this case it is apache sending cached gzipped content w/o the headers that would allow browser to recognize that the content is actually gzipped.

Generated at Mon Feb 12 05:27:43 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.