[MGNLXAA-100] Locks are never released in some cases and timeout is not configurable Created: 29/May/17  Updated: 29/May/17

Status: Open
Project: Transactional Activation
Component/s: None
Affects Version/s: 2.3.2
Fix Version/s: None

Type: Bug Priority: Neutral
Reporter: David Wartel Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Bug DoR:
[ ]* Steps to reproduce, expected, and actual results filled
[ ]* Affected version filled

 Description   

In rare cases we get:
Message received from subscriber: Operation not permitted, /xyz is locked
This happens when for whatever reason activation fails and lock is not released properly
If one checks the code it is visible that cleanup is not guaranteed to run even if content got locked:

try {
            final String utf8AuthorStatus = getHeader(request, BaseSyndicatorImpl.UTF8_STATUS);
            // null check first to make sure we do not break activation from older versions w/o this flag
            if (utf8AuthorStatus != null && Boolean.parseBoolean(utf8AuthorStatus) != SystemProperty.getBooleanProperty(SystemProperty.MAGNOLIA_UTF8_ENABLED)) {
                throw new UnsupportedOperationException("Activation between instances with different UTF-8 setting is not supported.");
            }
            final String action = getHeader(request, BaseSyndicatorImpl.ACTION);
            if (action == null) {
                throw new InvalidParameterException("Activation action must be set for each activation request.");
            }

            // verify the author ... if not trusted yet, but no exception thrown, then we attempt to establish trust
            if (!isAuthorAuthenticated(request, response)) {
                status = BaseSyndicatorImpl.ACTIVATION_HANDSHAKE;
                setResponseHeaders(response, statusMessage, status, result);
                return;
            }
            // we do not lock the content on handshake requests
            applyLock(request);
        } catch (ExchangeException e) {
            // can't obtain a lock ... this is (should be) a normal threading situation that we are
            // just reporting back to user.
            log.debug(e.getMessage(), e);
            // we can only rely on the exception's actual message to give something back to the user
            // here.
            statusMessage = StringUtils.defaultIfEmpty(e.getMessage(), e.getClass().getSimpleName());
            status = BaseSyndicatorImpl.ACTIVATION_FAILED;
            setResponseHeaders(response, statusMessage, status, result);
            return;
        } catch (Throwable e) {
            log.error(e.getMessage(), e);
            // we can only rely on the exception's actual message to give something back to the user here.
            statusMessage = StringUtils.defaultIfEmpty(e.getMessage(), e.getClass().getSimpleName());
            status = BaseSyndicatorImpl.ACTIVATION_FAILED;
            setResponseHeaders(response, statusMessage, status, result);
            return;
        }

Cleanup only occurs in second try...catch:

try {
            result = receive(request);
            status = BaseSyndicatorImpl.ACTIVATION_SUCCESSFUL;
        } catch (OutOfMemoryError e) {
            Runtime rt = Runtime.getRuntime();
            log.error("---------\nOutOfMemoryError caught during activation. Total memory = "
                    + rt.totalMemory()
                    + ", free memory = "
                    + rt.freeMemory()
                    + "\n---------");
            statusMessage = e.getMessage();
            status = BaseSyndicatorImpl.ACTIVATION_FAILED;
        } catch (PathNotFoundException e) {
            // this should not happen. PNFE should be already caught and wrapped in ExchangeEx
            log.error(e.getMessage(), e);
            statusMessage = "Parent not found (not yet activated): " + e.getMessage();
            status = BaseSyndicatorImpl.ACTIVATION_FAILED;
        } catch (ExchangeException e) {
            log.debug(e.getMessage(), e);
            statusMessage = e.getMessage();
            status = BaseSyndicatorImpl.ACTIVATION_FAILED;
        } catch (Throwable e) {
            log.error(e.getMessage(), e);
            // we can only rely on the exception's actual message to give something back to the user here.
            statusMessage = StringUtils.defaultIfEmpty(e.getMessage(), e.getClass().getSimpleName());
            status = BaseSyndicatorImpl.ACTIVATION_FAILED;
        } finally {
            cleanUp(request, status);
            setResponseHeaders(response, statusMessage, status, result);
        }

There is also no configurable timeout and the default is Long.MAX_VALUE => we need to restart machine or unlock nodes from groovy.
From what I see value of lock timeout comes from:
org.apache.jackrabbit.core.lock.XAEnvironment#lock(org.apache.jackrabbit.core.NodeImpl, boolean, boolean)
where it calls:
lock(node, isDeep, isSessionScoped, Long.MAX_VALUE, null);


Generated at Mon Feb 12 11:07:04 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.