Uploaded image for project: 'Scheduler'
  1. Scheduler
  2. MGNLSCH-25

It should be possible to run a job on a single node of the cluster without forcing the node id.

XMLWordPrintable

    • Icon: Improvement Improvement
    • Resolution: Won't Do
    • Icon: Neutral Neutral
    • None
    • 1.4.3
    • None
    • Yes

      I'm opening a new issue to avoid losing a problem discussed in MGNLSCH-15, coping a few comments.

      MGNLSCH-15 Added the possibility to force the clusterId on which a job should be run in a cluster. This is useful but in my opinion does not cover the average use case in a cluster. In my experience, when I set up a Magnolia cluster, is usually done with a bunch of cloned virtual (or physical) machines, that are identical. So, when coding a scheduled job, I usually have only two use cases:

      • a job that should be run on all the (active) cluster nodes. Usually this is something like temporary file cleaner, cache refresher, or something related to every single machine.
      • a job that must be run on a single active node of the cluster, whatever it is. This usually operates somehow on the Jackrabbit clustered data, any node that runs the job will access the same data and change it for all other nodes, and the node that runs the job is not really important, it just need to be one among the active ones when the trigger fires. Specifying a single node for the job would in this case skip the job if the designated node is offline for any reason when the trigger fires, but in my opinion a well configured cluster should not behave differently depending on which nodes are online.

      The first use case is the current behaviour. The other behaviour is not completely covered by the clusterId, but I think it could be done using Jackrabbit cluster-wise locks, just like in the AbstractClusterLockCommand.java class attached. The node on which the job will run is not forced, just the first one that gets the lock is the one that will run the job, the others will quietly skip.
      There is a slight chance that the lock is not released if the node dies abruptly during the job execution, but forcing a timeout should solve the problem (unlike session locks, cluster locks are not automatically released). Another improvement is that we should have a dedicated repository for jobs locks, maybe with auto-created nodes with the job name, that will be the lock paths, but maybe I'm overthinking things. At the moment the class uses the jobs definition nodes in the config repo as lock paths.

        Acceptance criteria

              Unassigned Unassigned
              dfghi Danilo Ghirardelli
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved:

                  Task DoD