[MAGNOLIA-3671] User locked under heavy load. Created: 28/Apr/11  Updated: 19/Jul/11  Resolved: 16/May/11

Status: Closed
Project: Magnolia
Component/s: security
Affects Version/s: 4.4.3
Fix Version/s: 4.4.4

Type: Bug Priority: Blocker
Reporter: Danilo Ghirardelli Assignee: Ondrej Chytil
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Java Source File JCRAuthenticationModule.java     Java Source File MgnlUser.java     Text File stacktrace.txt    
Issue Links:
causality
caused by MAGNOLIA-3557 Implement automatic account lockout a... Closed
relation
is related to DOCU-148 Account lockout after failed attempts Closed
is related to MAGNOLIA-3683 Improve account lockout Closed
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[ ]* Steps to reproduce, expected, and actual results filled
[ ]* Affected version filled
Date of First Response:

 Description   

I updated to Magnolia 4.4.3, my frontend configuration is clustered, two public instances.
When the load rises, sometimes this exception below happens. This is particularly problematic because after the problem the instance is completely locked, and shows the magnolia login.
This seems to be caused by modification done in MAGNOLIA-3557, that stores the access count. Doing so for the anonymous user seems to cause concurrent modification. I'll do further investigation, but this is a real blocking problem for me because my instances are failing often.

ERROR info.magnolia.cms.security.SystemUserManager 28.04.2011 10:35:53 – Failed to login as anonymous user
javax.security.auth.login.LoginException: java.lang.RuntimeException: javax.jcr.InvalidItemStateException: 2d78094b-8f7e-4c95-8b1d-22e3dc417c34/{}failedAttempts has been modified externally
at info.magnolia.cms.security.MgnlUser.setFailedLoginAttempts(MgnlUser.java:96)
at info.magnolia.jaas.sp.jcr.JCRAuthenticationModule.matchPassword(JCRAuthenticationModule.java:140)



 Comments   
Comment by Danilo Ghirardelli [ 28/Apr/11 ]

As a temporary solution for the problem you can set to zero /server/security/userManagers/*/maxFailedLoginAttempts, so the feature is ignored.
At least my production is usable now, so you can lower the priority, but the problem is still there, the only way I found to replicate this is to try a high load on a public cluster (so there is a high chance of concurrency).

Comment by Danilo Ghirardelli [ 28/Apr/11 ]

No, I was wrong. Public cluster still crashes with the same exception, we are now going with a single machine (that way seems ok, there is no concurrency). My patch now is just to comment the entire content of setFailedLoginAttempts method.

Comment by Philipp Bärfuss [ 28/Apr/11 ]

This is really critical, the anonymous user should not be updated.

Comment by Danilo Ghirardelli [ 29/Apr/11 ]

There are a couple of things that I don't understand about the setFailedLoginAttempts: first, it throws a RuntimeException that is never catched. Many other methods in the same class have the same throw but with a TODO attached, the problem is that the exception will not simply fail a single request (which is bad anyway but "recoverable"), but blocks completely the user, which is really bad for the anonymous or superuser ones...
Second, there is a "mutex" object used for syncronization in setLastAccess method, that should be the right way to avoid concurrency problems, but still I can't see how this will avoid concurrency problems when in clustering. The right part about setLastAccess is that it never throws exceptions, a similar approach for other methods (or at least for the setFailedLoginAttempts) should be better than locking the user completely.

Comment by Ondrej Chytil [ 29/Apr/11 ]

Hi Danilo,

update of failedAttempts is now skipped for anonymous user to avoid this issue.
Also exception handling in setting method is handled without throwing RuntimeException.
I attached modified classes so you can use them right away.

Comment by Danilo Ghirardelli [ 29/Apr/11 ]

Uhm... I haven't tried it, but I see there are three calls to setFailedLoginAttempts in the matchPassword, and only one is skipped for anonymous. Given that the default max retries is 5, the other two calls may be still enabled, even if I don't know if a wrong login for anonymous is possible.

Comment by Ondrej Chytil [ 29/Apr/11 ]

Not by normal accessing page without login. But I tried (and unfortunately succeeded) to actually login as "anonymous" with random string as password and then it's possible.
So to be sure I'm re-opening ticket to completely disable this for anonymous.
Thanks for pointing that out.

Comment by Danilo Ghirardelli [ 29/Apr/11 ]

Just to be curious, in a simple Magnolia config with just the superuser, what happens if someone tries the password too many times? Will the superuser be locked out so the instance is unrecoverable (or unpublishable if it's a public one)? Or is there a check that a user with superuser role should always be active in the system?

Comment by Ondrej Chytil [ 29/Apr/11 ]

With default settings the account is locked (after 5 wrong attempts) and by normal means instance is unaccessible (considering there is only superuser and anonymous). There is possibility to re-enable superuser using groovy or scripts.
This possibility to disable superuser account is reason why whole lockout is configurable separately for System user accounts with option to disable it entirely so you can choose what settings fits you best regarding security x accessibility.

Comment by Boris Kraft [ 06/May/11 ]

By the way, 5 attempts are a too low default. There is little reason not to use a default that is high enough to normally not fire, but still avoid brute force attacks. In other words, even a default of 100 would still be sufficiently safe. In real life I have read case studies where the default has been set to 10 (from 5 or a similar low number) and this has led to a drastic decrease in help desk tickets. 5 is definitely too low.

Secondly, if you ask me there is no need to store this count anywhere but in memory; and no reason to sync between even clustered instances. If you follow the first comment that the problem is not to find the minimum bearable number of failed attempts but to avoid brute force attacks, it really makes no difference if you can try 10 times on one machine or 10 times on each machine (provided you know how to access them directly, if at all possible). This should eliminate the problem I hope.

Comment by Danilo Ghirardelli [ 10/May/11 ]

I tried the attached patch in our clustered test and seems fine even under heavy load, we'll see in production in a few days.

As a side note, I completely agree with Boris, the limit is low (for the average non-technical user), and there is really little need to save this data across the cluster.

Comment by Ondrej Chytil [ 10/May/11 ]

I created MAGNOLIA-3683 ticket to implement your remarks. Especially that default number of max failed login attempts seems to be more annoying from user side than I thought before so it'll be increased for next release.

Comment by Nils Breunese [ 11/May/11 ]

I'd just like to call a 'me too' on this issue. We have had our instances showing login pages to visitors a couple of times now since we've upgraded to Magnolia 4.4.3 in production. I haven't found 'failedAttempts' in our logs, but we do get 'Failed to login as anonymous user' from SystemUserManager.

For now I have set config:/server/security/userManagers/*/maxFailedLoginAttempts to 0 as suggested by Danilo in the first comment, let's hope this works.

Comment by Danilo Ghirardelli [ 11/May/11 ]

I'm sorry to say that it won't work, I suggested it and in the next comment I said that wasn't a solution, concurrency problems are still there, but seems you are luckier than me because the login appears just a couple of times and does not lock completely the anonymous user.

The only solution is to turn off all instances except one, or using the two classes attached to "patch" the problem, but for this you'll have to do a release anyway.

Comment by Philipp Bärfuss [ 12/May/11 ]

Hi Nils,

Are you also using a clustered repository or are you confronted by the issue on a 'normal' installation? I will definitely push the 4.4.4 release now that this starts to be a more global issue.

Comment by Nils Breunese [ 12/May/11 ]

Hi Philipp, we have a setup with one author and two public instances and the forum workspace is clustered.

I guess we'll be doing a production branch update with the 'patched' classes today.

Comment by Ondrej Chytil [ 16/May/11 ]

Re-opening to improve the fix furthermore.

Generated at Mon Feb 12 03:48:35 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.