[MGNLCACHE-254] Config to cache responses larger than 500KB Created: 01/Apr/22  Updated: 18/Apr/23  Resolved: 31/Mar/23

Status: Closed
Project: Cache Modules
Component/s: None
Affects Version/s: None
Fix Version/s: 5.9.5

Type: Improvement Priority: Major
Reporter: Tomáš Gregovský Assignee: Chuong Doan Huy
Resolution: Done Votes: 1
Labels: None
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: 3d 7h Time Spent: 3d 7h
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Attachments: JPEG File Blog___Magnolia_Headless_CMS.jpg     PNG File Screenshot 2023-03-30 at 17.17.46.png     PNG File Screenshot 2023-03-30 at 17.17.50.png    
Issue Links:
Cloners
Relates
Sub-Tasks:
Key
Summary
Type
Status
Assignee
MGNLCACHE-281 Implement Sub-task Completed Chuong Doan Huy  
MGNLCACHE-282 Review code Sub-task Completed Dai Ha  
MGNLCACHE-283 Pre-int QA Sub-task Completed Dai Ha  
MGNLCACHE-284 Final QA Sub-task Completed Jaroslav Simak  
MGNLCACHE-285 DOCsub: Mention on Cache core module ... Sub-task Closed Martin Drápela  
Template:
Acceptance criteria:
Empty
Task DoD:
[X]* Doc/release notes changes? Comment present?
[X]* Downstream builds green?
[X]* Solution information and context easily available?
[X]* Tests
[X]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Release notes required:
Yes
Documentation update required:
Yes
Date of First Response:
Epic Link: Support
Sprint: DevX 34
Story Points: 3
Team: DeveloperX
Work Started:
Approved:
Yes

 Description   

We are using delivery endpoint, here is the definition:

 

class: info.magnolia.rest.delivery.jcr.v2.JcrDeliveryEndpointDefinition
workspace: blog_en_blogs2
limit: 100000
referenceDepth: 100
depth: 0
nodeTypes:
  - mgnl:composition

references:
  - name: imageReference
    propertyName: bannerImage
    referenceResolver:
      class: info.magnolia.rest.reference.dam.AssetReferenceResolverDefinition
      assetRenditions:
        - 450
        - 900
  - name: authorReference
    propertyName: authors
    referenceResolver:
      targetWorkspace: authors
      class: info.magnolia.rest.reference.jcr.JcrReferenceResolverDefinition
  - name: categoriesReference
    propertyName: categoriesFilter
    referenceResolver:
      class: info.magnolia.rest.reference.jcr.JcrReferenceResolverDefinition
      targetWorkspace: category 

this endpoint supposed to be cached but is not. when checking cache tools app, there is an entry for this endpoint, but can not be downloaded. My assumption is this entry is empty.

On the front end, endpoint it being requested without any parametr.

The loading time of this file is usually between 1.5 and 4 seconds (depends on connection) but thats a lot for 134kB file. BTW browser says this endpoint has 134kB but it seems to be after compression and 1.1MB before compression: 

 

 



 Comments   
Comment by Jaroslav Simak [ 05/Apr/22 ]

Cache threshold is hardcoded here: info.magnolia.module.cache.filter.CacheResponseWrapper#DEFAULT_THRESHOLD.

 

Comment by Christopher Zimmermann [ 05/Apr/22 ]

https://docs.magnolia-cms.com/product-docs/6.2/Modules/List-of-modules/Cache-modules/Cache-core.html#_in_memory_threshold

Comment by Christopher Zimmermann [ 05/Apr/22 ]

Seems like we need to lower the threshold - or make it more configurable - or best could it be based on the time of computation rather then just size of response? Like if its over 1 second of compute - it should be cached?

Comment by Christopher Zimmermann [ 12/May/22 ]

https://git.magnolia-cms.com/projects/MODULES/repos/cache/browse/magnolia-cache-core/src/main/java/info/magnolia/module/cache/filter/CacheResponseWrapper.java

But is that CacheResponseWrapper aabout Magnolia caaching the response... or about putting cache headers on the response so that the browser caches it?

Comment by Christopher Zimmermann [ 12/May/22 ]

tgregovsky  - your concern is about Magnolia doing the caching, correct? (Not about browser caching?)

Comment by Tomáš Gregovský [ 12/May/22 ]

hi czimmermann , yes - Magnolia server side caching (same like pages) ... (some delivery endpoint are to big, taking couple of seconds to be loaded and then they are being loaded on every visit = performance issue for Magnolia)

Comment by Christopher Zimmermann [ 16/May/22 ]

jsimak from this page: https://docs.magnolia-cms.com/product-docs/6.2/Modules/List-of-modules/Cache-modules/Cache-core.html#_in_memory_threshold

Do you understand this paragraph?

In memory threshold

This threshold is used to determine if a resource should be cached or not according to its size. The default value of 500K was not selected randomly, but as a result of testing that shown 98% of resources were served as fast from memory as from the repo when exceeding this value. This is mainly due to the fact that transport of such amount of data offsets time needed for accessing the repository.

Is this implying that if a response is under 500K that its more efficient to get from the repository than from memory? This seems unlikely to me. I'm wondering what the threshold is for, I would expect all responses to be faster from the in-mem cache. ANd not just be faster but reduce load on the the templating/rendering system.

 
 

Comment by Christopher Zimmermann [ 18/May/22 ]

tgregovsky we are going to look into this in the sprint starting next week (May 23) - to see if thiere is a bug - why there is this 500Kb threshold - and how to improve caching behaviour.

In the meantime some things yoou could do depending on how urgent this is:

OPtion 1: Change your REST requests to have a size of less than 500KB.. If its less than 500 then it should be cached.

Option 2: Create a custom REST endpoint (requires Java) and in there change the threshold value as mentioned here: https://docs.magnolia-cms.com/product-docs/6.2/Modules/List-of-modules/Cache-modules/Cache-core.html#_in_memory_threshold "You can still change this value programmatically, for example, in your custom renderer which does time-consuming operations:"

Comment by Jaroslav Simak [ 06/Jun/22 ]

We will not increase the threshold, if there is a need for JSON responses larger than 500KB, then developers should use pagination.

Comment by Tomáš Gregovský [ 07/Jun/22 ]

hi jsimak , thats pity to be honest. just to sum up again: there is an endpoint which has 134kB (not to big in my opinion) but it is taking sometimes even up to 4 seconds to receive. Probably due to this data are calculated (jcr query, etc) before they are returned, every single time for every response. In our usecase we need to receive all the data in once (pagination is not an option) and also since you can't specify which subnodes could be and which could not be part of json data, the fact that endpoints can't be cached are making use of delivery endpoints in production quite hard  

Comment by Michael Schneider [ 05/Jul/22 ]

Hi jsimak,

can you have a look at Tomas' last comment?

We could use some support on this issue as described above.

Thanks,

Michi

Comment by Pierre Sandrin [ 04/Aug/22 ]

Hello, we are running into the same issue on one of our Headless Projects. With all the resolved references the response for the homepage becomes 1.6Mb with a response time of about 1.2s. The CPU load is quite high when these requests are not cached. Isn't caching the more important the larger (and costly) the response is? The 500k threshold doesn't make any sense to me. Would appreciate if you could try to find a solution for this since we are not the only ones having a Problem. Otherwise we have to throw the delivery endpoints in the trash which would be a pity because they are so cool!

Comment by Chuong Doan Huy [ 16/Mar/23 ]

czimmermann Yes, Magnolia cache is currently utilizing Ehcache3 which already implemented the removal when content exceeds limit.

Discovery :
+ Current situation : hard-coded that response is larger than 500KB is not cached in Memory. In fact, cached entry still be created as explained in the source code "The entry still remains in cache and serves as a token to prevent caching in the future". This explain why customers said "it looks like we do have the media queries cached, but for some reason it is never used"

+ Something we can do :
1/ Make the threshold (500KB) configurable. This should be easy to achieve as we already have some flags configurable (e.g refreshOnNoCacheRequests). If RAM is the concern, we can warn in the docs and/or Reduce the maximum cache entries in memory (currently 10000 entries)
2/ May be consider making disk cache mechanism enabled paralel for larger response (currently we can only choose HEAP or Disk for underlying ehcache)
3/ Refactor so that larger than threshold response is not virtually cached / displayed in the cache app

**Additional concern (need to verify) : threshold calculation mismatch (Postman indicate response size much less than threshold but calculation in code still mark it over threshold)

Comment by Christopher Zimmermann [ 16/Mar/23 ]

Thanks chuong.doan. I think making threshold configurable seems like a good path. Configuration must be easy to do and ideally via light development YAML file. What do you think pierre and tgregovsky ?

One question is where to configure it.. globally somehow? Or on the Delivery endpoints?

Regarding "threshold calculation mismatch". Maybe due to compression over HTTP? https://en.wikipedia.org/wiki/HTTP_compression
If you save the response to text file maybe it has a different size?

You mention cache limit determined by # of entries (10000), I would have thought the cache limit should rather be set by amount of memory to allocate to the cache. Is that also an option? Would that not be safer, as far as preventing the cache from overwhelming the server?

In general my hunch is that this very large response size is rather an edge case.

I would not think adding Disk cache is necessary at this time, unless there seems to be a need from someone.

Comment by Pierre Sandrin [ 17/Mar/23 ]

1) Making it configurable would be a solution for us. Per Endpoint or in the cache config is ok.

2) I agree with Cristopher that it is probably the gzip compression that reduces the size. The uncompressed version is the value that counts for the threshold.

3) I'm not too much concerned about "exploding" the cache since the are not many different rest requests to be cached that are so big. It's mainly the one to the /home page that is very big.

4) I agree that if possible the cache limit should be set as an amount of Memory

Comment by Chuong Doan Huy [ 17/Mar/23 ]

1) Configured in the cache config so that it's consistent with others cache configurations. Config per endpoint would require a more complex refactor. So i would opt for global config in the cache.
2), 3) OK
4) This is currently already configurable although it's not the default one. Should we change the default behavior (10k entries) to something like 1 GB as Memory ? czimmermann pierre

For viet.nguyen comment :
1. Configurable after this ticket
2. Configurable (current system set it as 10k entries)
3. Configurable (switch between number of entries or size of Memory)
4. Configurable (switch between using in-memory or disk)
5. Agree this need to be investigated/fixed but should be in separate ticket

Comment by Christopher Zimmermann [ 20/Mar/23 ]

I think I would leave the current default as is (using ENTRIES). Reason, I would hate to impact an existing customers instance negatively. If a developer configures a new limit for size of items to be cached - they can also configure those other values.

Do you see any problems if we keep the default as is?

If we change the default, I would change it in 6.3

Comment by Chuong Doan Huy [ 20/Mar/23 ]

Thanks Topher. No problem keeping the default, even better IMHO.
So i think we can go to the next phrases (grooming and implement) with the conclusion for this ticket : make the threshold configurable in the cache configurations.

Generated at Sun Feb 11 23:53:42 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.