[MGNLCACHE-228] Range requests from Facebook don't work Created: 09/Jan/19  Updated: 30/Nov/20  Resolved: 29/Apr/20

Status: Closed
Project: Cache Modules
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Frank Sommer Assignee: Evzen Fochr
Resolution: Workaround exists Votes: 2
Labels: maintenance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File config.server.filters.gzip.bypasses.facebookCrawler.yaml    
Issue Links:
Problem/Incident
Relates
documentation
to be documented by DOCU-1998 Document workaround for issue with re... Closed
relation
is related to MAGNOLIA-4713 Uncached pages are sent with incorrec... Closed
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[ ]* Steps to reproduce, expected, and actual results filled
[ ]* Affected version filled
Documentation update required:
Yes
Date of First Response:
Epic Link: Support
Sprint: Maintenance 4
Story Points: 5

 Description   

Timebox for investigation: 5 SP

The facebook share does not work for us on several Magnolia websites. Facebook uses range request for reading the og-Tags from the websites. I think there is a problem in the processing of range requests in the RangeSupportFilter and the Magnolia Cache.
Strange is, that range requests with parameters work, because they bypass the cache. Range requests without a parameter have no body content.
I can't reproduce this on demopublic. Only thing I can see is, that Magnolia delivers more content without parameters than with parameters. Furthermore the content length is set to Integer.MAX.

See also https://groups.google.com/a/magnolia-cms.com/forum/#!topic/dev-list/UORUG13RwoQ



 Comments   
Comment by Christopher Zimmermann [ 14/Jan/19 ]

I could imagine that this is the culprit:

https://stackoverflow.com/a/25131439

"Facebook scraps ALWAYS the first 40k of data of a given page. That means if your page is more than that (the compressed one) then it will be partially downloaded, and because is compressed it will be broken. Needs some logic so that when the request is from Facebook, page will not be compressed"

Comment by Christopher Zimmermann [ 14/Jan/19 ]

Another related forum thread:

https://groups.google.com/a/magnolia-cms.com/forum/#!topic/dev-list/lCA4nL4IcLE

Comment by Joerg von Frantzius [ 15/Jan/19 ]

Makes you wonder why Facebook would request a 40K range with gzip compression if they cannot decode it anyway?

https://stackoverflow.com/a/53135659 says:

It's possible to request gzipped content over a range, but only with Transfer-Encoding, and not with Content-Encoding.

I guess that's what Facebook is doing, i.e. using Transfer-Encoding with their Range-Request, so they'll be able to decode their partial response.

Generated at Sun Feb 11 23:53:27 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.