Date
2024-01-29
Authors
Michael Zabka, CTO
Michal Artazov, DevOps Lead
Vaclav Boch, DevOps
Impact: Caused display connection issues and slow system responses to received commands.
Trigger: Overloaded platform services.
Detection: Internal monitoring and customer tickets.
Root Causes: Our services directly communicating with displays got overloaded. This increased response times, causing displays to reconnect. After several failed attempts, the displays fell back to our backup system for communication. This produced a large number of system messages and overloaded our configuration servers, further increasing the general system latency.
Remediation: