[BUILD-1003] Rollback from triggerRemoteJob plugin to GWT plugin Created: 26/Jan/23 Updated: 03/Feb/23 |
|
| Status: | Open |
| Project: | Build |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Neutral |
| Reporter: | Maxime Michel | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Template: |
|
||||||||
| Acceptance criteria: |
Empty
|
||||||||
| Date of First Response: | |||||||||
| Description |
|
As discussed in today's Testing&QA meeting, we uncovered two issues with the triggerRemotePlugin that don't exist with the Generic Webhook Trigger plugin:
|
| Comments |
| Comment by Rubén Martín Romero [ 26/Jan/23 ] |
|
Regarding the first point, it is weird to me, since the behavior should be equivalent to the build trigger, and in that case AFAIK the job is queued by default. Anycase I have had a look to the triggerRemotePlugin options, I have seen this one that could help us on this: /* Wait to trigger remote builds until no other builds are running. Prevent Remote Build QueueWait to trigger remote builds until no other builds are running. mandatory: no default: false */ //Example: triggerRemoteJob blockBuildUntilComplete: false, job: '<remote_job>', preventRemoteBuildQueue: true, useCrumbCache: true, useJobInfoCache: true OTOH, can you share with me some job in Core Jenkins that is using this trigger remote plugin so I can do some test? With respect to the second point, we can easily add (and we should do it) a build description to the triggered job (the one in SRE Jenkins), which check if the trigger user is sre, and if so put some message like "Remote triggered" and even the timestamp or any other additional info that you consider. This is something that we already have in quite a few jobs, and you can use these examples as reference:
|
| Comment by Maxime Michel [ 30/Jan/23 ] |
The difference is that the GWT pings a proxy pipeline in which we call `build`, which is Jenkins standard behavior, hence it will add it to the queue. With triggerRemoteJob, though, there is logic in the plugin that decides to not trigger anything unless satisfying conditions are met.
I'm not sure I understand the option above, so some experimenting could be useful indeed (see below), however, I'm afraid that if it waits for the upstream build to finish, then that's going to make core CI even longer, and that's not acceptable.
You can perform the change with one of the registered relationships we have registered (boms - cloud-webapp) here: https://git.magnolia-cms.com/projects/BUILD/repos/pipeline-templates/browse/vars/magnoliaDefaultPipeline.groovy#237 Then trigger a build here: https://jenkins.magnolia-cms.com/job/build/job/boms/job/master/
This is better than the current situation, however, ideally what we would want for optimal debugging purposes would be to know which upstream is the exact culprit (which job & which build number). This is not possible with the only data available being the username of the bot that triggered the job & the date? Here are the couples that are currently registered:
|
| Comment by Rubén Martín Romero [ 02/Feb/23 ] |
|
Thank you for the update mmichel and also for the job provided to test the remote trigger. Very useful! I have done some tests now in the afternoon, when the activity in SRE Jenkins is much calmer, and the first thing that I have verified is that the jobs triggered from Core Jenkins are correctly queued in the SRE Jenkins, which is actually the standard behavior of Jenkins regardless of the source that is triggering the job. So although you can see this message in https://jenkins.magnolia-cms.com/job/build/job/boms/job/master/156/console: The remote job is blocked. Build #2,563 is already in progress (ETA: 26 min). I have verified that the job is correctly queued in the SRE jenkins side:
Therefore, we don't even need to consider adding any additional option to the remote trigger call performed from Core Jenkins, since this statement doesn't match the actual behavior:
Regarding the second point and your last update:
How were you previously getting all that information using the GWT? Maybe adding that info. as variables to the request, and referencing them in the cause set up in that target webhook job (the intermediate one created in SRE Jenkins)? Even being this the case, I still think that is not justified to rollback to the previous solution based on GWT just because of this, since adding a build description to indicate when the build is triggered from remote (Core Jenkins) IMHO should be enough for us... WDYT mmichel about giving a chance to this option and see how it works? I am telling you this because we already had an internal discussion (in SRE) to uninstall the GWT from our Jenkins, since right now we don't have any job using that feature, and this is something that we still have on the table :S |
| Comment by Maxime Michel [ 03/Feb/23 ] |
|
As far as the build actually triggering jobs and queuing them, I don't think the 'manual testing on a quiet afternoon' is an actual representation of what the plugin is doing in production. After all, we have all seen it not happening in front of us during the meeting.
I think this was brought up in a Foundation-DevX meeting 10 days ago, and yesterday again during the #testing-qa meeting as a pain point for everybody, that it's not clear enough which pipelines trigger which in general. In the particular case of cross-Jenkins builds, as I have explained above, multiple source jobs from core Jenkins may trigger target jobs on SRE Jenkins, hence a generic description doesn't cut it. If a developer is trying to troubleshoot why a build is failing and the source is upstream core Jenkins, with your solution he wouldn't know which source job it is, even less which actually build number. Knowing the build number could tie it to the commit that triggered all of it. PS: it's been a week now… And I don't find the explicit request in the Foundation-DevX notes anymore. So not sure anymore whether DevX requested this to me, if we agreed to do that during the meeting, or it was only discussed with mgeljic. Let me unassign myself anyway until this is requested again because I have better things to do than argue about pluginA vs pluginB. To be honest you should keep the complexity of your domain to yourself, as a non-SRE I don't need to know that you want to install or uninstall one additional plugin. |
| Comment by Rubén Martín Romero [ 03/Feb/23 ] |
|
Hi mmichel , thanks for your update! Regarding this:
Please, can you point me to the case in which the job was not correctly queued? That is an standard behavior of Jenkins that should always work regardless of the load that Jenkins is dealing, so if this is not working as expected in some cases, we would need to review that issue in order to fix it. Anycase we will try to be also attentive to this from our side. BTW and just to clarify, Jenkins only queue one build per job, and therefore if there is a build already queued, Jenkins will never add another build to the queue for that job. Could this match the case that you saw this morning? |