[MAGNOLIA-8760] Usage of PostgreSQL to publish heavy data loads increases toast tables size too much Created: 13/Feb/23  Updated: 04/Jul/23  Resolved: 08/Mar/23

Status: Closed
Project: Magnolia
Component/s: None
Affects Version/s: 6.2.28
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Roberto Gaona Assignee: Daniel Alonso
Resolution: Done Votes: 0
Labels: cs-bk
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Attachments: PNG File 4_5k.png     Java Source File BatchPublicationCommand.java     PNG File after_vacuum.png     PNG File before_vacuum.png     PNG File chart.png     PNG File image-2023-02-27-15-30-37-729.png     PNG File image-2023-02-27-15-31-26-772.png     PNG File image-2023-02-28-10-15-31-357.png     PNG File image-2023-02-28-10-16-05-662.png     PNG File image-2023-03-01-17-22-16-421.png     Text File pg_config_parameters_public.txt     XML File pom.xml     XML File postgres-repo-conf.xml     XML File repositories.xml    
Issue Links:
Problem/Incident
Relates
relates to PUBLISHING-202 Not able to publish huge amount of it... Closed
causality
Sub-Tasks:
Key
Summary
Type
Status
Assignee
MAGNOLIA-8794 Implementation Sub-task Completed Daniel Alonso  
MAGNOLIA-8801 Review Sub-task Completed Fernando Cherchi  
MAGNOLIA-8807 Docu Sub-task Completed Adrian Brooks  
MAGNOLIA-8814 DOC: review Documentation Task Completed Adrian Brooks  
Template:
Acceptance criteria:
Empty
Task DoD:
[X]* Doc/release notes changes? Comment present?
[X]* Downstream builds green?
[X]* Solution information and context easily available?
[X]* Tests
[X]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[X]* Steps to reproduce, expected, and actual results filled
[X]* Affected version filled
Release notes required:
Yes
Documentation update required:
Yes
Date of First Response:
Visible to:
Christian Menzel, Fabian Mangold, Jan Haderka
Epic Link: Performance problems with too many child nodes
Sprint: Nucleus 31
Story Points: 5
Team: Nucleus
Work Started:

 Description   

It has been reported that using PostgreSQL and trying to publish any high amount of data/number of sites or pages will make the toast table increase in size up to occupying the whole available DB.

This issue has been reported on PostgreSQL 11.9 when the user tried to publish 8000 sites and ended up filling 15GB of DB available size when trying to publish 120MB of data.

You can find the used configuration parameters attached to this ticket.

There are some workarounds that can be used, like performing an export/import of the data for a first heavy load publication instead of performing a normal publish, executing a vacuum of the toast table after the publish or directly increasing the DB size so it's not filled during the publication.



 Comments   
Comment by Jan Haderka [ 21/Feb/23 ]

One possible clue to the problem might be that it wasn't happening roughly about 12months ago (+/-) so perhaps we can narrow down the issue to what has changed in that time. Something in the product or drivers or db or infra used to run it?

Could it be adverse effect of some 3rd party lib or some seemingly innocuous code change?

The effect is made more pronounced when working with either flat structures or high amounts of data. What is so special about it except the validation overhead and amount of memory consumed by the data and traffic generated by retrieving it from DB?

When checking the internet for underlying causes of DB failing to reclaim space itself, it often comes down to long living transactions or other live long living connections to DB that keep claim on parts of the temp space hence preventing db from reclaiming block of garbage from memory as it would reclaim only space between last used block and end of the temp heap. What could such connection be?

While first detected on db_datastore used as backing storage for binaries, it can be seen on other tables too (as mentioned above - those with big data or a lot of flat data). Is there anything else those have in common?

... just some food for thought.

Comment by David Martin [ 04/Jul/23 ]

Hi Folks,

This bug was closed, what was it's resolution? Was it fixed, is there a workround?

Cheers, Dave

Generated at Mon Feb 12 04:35:33 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.