Details
-
Story
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
Description
Current situation
We have implemented a basic monitoring and alerting using AWS only (ABTEST-363)
This is okay for Public Beta, still can trace the error/incident, however it is not fully managed and monitoring in one place.
Solution
- Having Datadog integration will have more insights and metrics what happening on the infrastructure.
- SRE team could monitor the infra and report the issue to the team in order to resolve the incident more efficient
Questions 
- Not sure how much works and features of Datadog that we should integrate. Any idea agarcia ?
- Do we still use the monitoring from AWS managed services (SNS, Alarms)?
Checklists
Acceptance criteria