[ABTEST-439] Integrate ABn Testing infra with Datadog for logging and alerting Created: 13/Apr/21  Updated: 26/Aug/22

Status: Open
Project: A/B Testing
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Story Priority: Minor
Reporter: Nguyen Phung Chi Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Template:
Acceptance criteria:
[ ]* Integrate with Datadog
[ ]* Have logging and alarm implemented
[ ]* Config the alarm send to SRE and ABn Testing team
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Epic Link: ABn Later
Team: AuthorX

 Description   

Current situation

We have implemented a basic monitoring and alerting using AWS only (ABTEST-363)

This is okay for Public Beta, still can trace the error/incident, however it is not fully managed and monitoring in one place.

Solution

  • Having Datadog integration will have more insights and metrics what happening on the infrastructure.
  • SRE team could monitor the infra and report the issue to the team in order to resolve the incident more efficient

Questions

  • Not sure how much works and features of Datadog that we should integrate. Any idea agarcia ?
  • Do we still use the monitoring from AWS managed services (SNS, Alarms)? 

 

 


Generated at Sun Feb 11 22:56:14 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.