[MGNLMAIL-116] There should be a timeout on SMTP requests Created: 28/Sep/21  Updated: 01/Nov/21  Resolved: 18/Oct/21

Status: Closed
Project: Magnolia Mail Module
Component/s: None
Affects Version/s: None
Fix Version/s: 5.5.9

Type: Bug Priority: Neutral
Reporter: Maxime Michel Assignee: Michael Duerig
Resolution: Fixed Votes: 0
Labels: VN-Maintenance, artt
Remaining Estimate: Not Specified
Time Spent: 1h
Original Estimate: Not Specified

Issue Links:
Relates
relation
is related to MGNLMAIL-117 DOC: Calls to the SMTP server now tim... Closed
Template:
Acceptance criteria:
Empty
Task DoD:
[X]* Doc/release notes changes? Comment present?
[X]* Downstream builds green?
[X]* Solution information and context easily available?
[X]* Tests
[X]* FixVersion filled and not yet released
[X]  Architecture Decision Record (ADR)
Bug DoR:
[X]* Steps to reproduce, expected, and actual results filled
[X]* Affected version filled
Documentation update required:
Yes
Date of First Response:

 Description   

As we found out the hard way on the cloud, the default value for SMTP requests is to never time out. When the requests pile up, the mail server gets overloaded, leading Magnolia itself to crash. We now provide a default time out value of 5 seconds, which can be overriden via the timeoutInMillis field of SmtpConfiguration.



 Comments   
Comment by Michael Duerig [ 04/Oct/21 ]

Analysis

  • There is 498 threads contending for the instance monitor at SimpleMailHandler#sendMail. The thread that is holding that monitor looks ok and is happily progressing. Removing synchronized from SimpleMailHandler#sendMail will not improve the situation as this would only shift the contention to locks further down the call chain (i.e. javax.mail.Service.connect).
  • All 498 threads have the same stack trace showing jdk.nashorn.internal.scripts.Script$Recompilation$5002$1442AAAAAA$^eval_.ContactForm#sendEmail(jdk.scripting.nashorn.scripts/<eval>:76) as the source for sending those mails.
  • From the information available I cannot determine whether the contention was caused by a slow SMTP server or too many sendMail calls in an unreasonable short amount of time.
  • From the product side there is nothing much we can do in a situation like this except from bailing out (i.e. degrade gracefully instead of impacting the whole instance). That is, we would start to fail when sending mails once the service is contended.

Conclusion

I propose to close this ticket until further information is available:

  • Problem occurs frequently
  • Access logs, and SMTP server logs for correlating times and frequency of these sendMail calls.
  • Conditionally on the above: degrade gracefully by failing sendMails calls in a contention scenario.

 

Comment by David Lopez [ 05/Oct/21 ]

Insights from chanh.hua

  1. Problem occurs frequently: At least it happened twice in the last 2 weeks (the subscription may have the issue because, but at that time we don’t deeply investigate)
  2. Access logs, and SMTP server logs for correlating times and frequency of these sendMail calls. We don’t host the SMTP server for them, so it would require to ask the customer
  3. Conditionally on the above: degrade gracefully by failing sendMails calls in a contention scenario.

This option sound nice, like setting timeout (or whatnot) on the mail client side would help to protect the impact on the whole instance.IMO, option 3 is a good option to go with (it is always good to set the proper timeout to external services isn’t it), in case we would like to have the logs of the SMTP server, we can wait until the next issue occur and ask the customer for it (but cannot guarantee that they would provide the logs)
 
Source: https://magnolia-cms.slack.com/archives/CDF2T239Q/p1633406142177800?thread_ts=1632820188.141700&cid=CDF2T239Q 

 

Generated at Mon Feb 12 06:03:48 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.