|
For years now the company has been relying on core services (Jenkins & Nexus are the closest to our field of work, but there's more) that are accidents waiting to happen for the following reasons:
- few people know how they work, and have limited availability
- they are not managed using infrastructure nor configuration by code, meaning changes aren't transparent
- in the case of Nexus 2, it's reaching EoL
- while they seem to run fine outside the odd glitch here & there, a real accident could have devastating consequences
For that reason I would like us to start exploring deploying those applications ourselves. Using configuration as code, Docker images & AWS resources, it should be fairly easy to reach a production-like status. See the following resources:
Open questions:
- should we use K8s & Helm charts?
- no according to SREs, plain instances will be easier
- are we comfortable doing this? This requires experimenting and getting a sense for the amount of work.
- assuming we manage to do it, who who then look after those instances? Should we start an internal oncall rotation? Or do we hand it off to the SREs?
- at some point, pitch it to SREs/ITI & align with them
- overlap with SREs -> in order for SREs to fully control the SaaS, they would need control over that theoretical Nexus instance to fulfil SLAs/SLOs
- any business impact? Then inform PM
|