[MGNLSCH-25] It should be possible to run a job on a single node of the cluster without forcing the node id. Created: 21/Feb/12  Updated: 18/Feb/16  Resolved: 18/Feb/16

Status: Closed
Project: Scheduler
Component/s: None
Affects Version/s: 1.4.3
Fix Version/s: None

Type: Improvement Priority: Neutral
Reporter: Danilo Ghirardelli Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Java Source File AbstractClusterLockCommand.java    
Issue Links:
relation
is related to MGNLSCH-15 Allow tasks to specify cluster node o... Closed
is related to MGNLSCH-23 Support cluster ID in configuration o... Closed
Template:
Patch included:
Yes
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Date of First Response:

 Description   

I'm opening a new issue to avoid losing a problem discussed in MGNLSCH-15, coping a few comments.

MGNLSCH-15 Added the possibility to force the clusterId on which a job should be run in a cluster. This is useful but in my opinion does not cover the average use case in a cluster. In my experience, when I set up a Magnolia cluster, is usually done with a bunch of cloned virtual (or physical) machines, that are identical. So, when coding a scheduled job, I usually have only two use cases:

  • a job that should be run on all the (active) cluster nodes. Usually this is something like temporary file cleaner, cache refresher, or something related to every single machine.
  • a job that must be run on a single active node of the cluster, whatever it is. This usually operates somehow on the Jackrabbit clustered data, any node that runs the job will access the same data and change it for all other nodes, and the node that runs the job is not really important, it just need to be one among the active ones when the trigger fires. Specifying a single node for the job would in this case skip the job if the designated node is offline for any reason when the trigger fires, but in my opinion a well configured cluster should not behave differently depending on which nodes are online.

The first use case is the current behaviour. The other behaviour is not completely covered by the clusterId, but I think it could be done using Jackrabbit cluster-wise locks, just like in the AbstractClusterLockCommand.java class attached. The node on which the job will run is not forced, just the first one that gets the lock is the one that will run the job, the others will quietly skip.
There is a slight chance that the lock is not released if the node dies abruptly during the job execution, but forcing a timeout should solve the problem (unlike session locks, cluster locks are not automatically released). Another improvement is that we should have a dedicated repository for jobs locks, maybe with auto-created nodes with the job name, that will be the lock paths, but maybe I'm overthinking things. At the moment the class uses the jobs definition nodes in the config repo as lock paths.



 Comments   
Comment by Michael Mühlebach [ 18/Feb/16 ]

Given the thousands of other issues we have open that are more highly requested, we won't be able to address this issue in the foreseeable future. Instead we will focus on issues with a higher impact, and more votes.
Thanks for taking the time to raise this issue. As you are no doubt aware this issue has been on our backlog for some time now with very little movement.
I'm going to close this to set expectations so the issue doesn't stay open for years with few updates. If the issue is still relevant please feel free to reopen it or create a new issue.

Generated at Mon Feb 12 10:45:10 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.