API responsing 500 to some requests
Incident Report for signageOS
Postmortem

Authors

Michal Artazov, Backend Team Leader

Summary

Failing of API requests caused by human error in configuration of request limiting sub-system of API.

Impact

Some API requests started unexpectedly failing with response code 500.

Trigger

Human error.

Detection

We combine Pingdom service and Postman Monitoring feature to monitor health of API endpoints. We were alerted about large amounts of 500 response codes as soon as it started happening.

Root Causes

Recently we added new functionality to API to be able to limit the number of requests each organization can do on certain endpoints. The goal of this functionality was to prevent misuse of the API and sending excessive amounts of requests that can degrade API’s performance for other clients.

At first we only applied it to two endpoints and it worked well. Today we decided to apply it to two more endpoints - Upload Applet Version Files and Upload and Update Applet Version Files.

To configure a limit for an endpoint, we match the endpoint with method and a path matching string in the format of express.js library. We use npm library path-to-regexp to match request’s path against the configurations.

A human error caused misconfiguration and configuring an invalid path matching string. Any time it would attempt to get matched against the request’s path, it would throw an exception, causing the request to fail with 500 response code.

Remediation

We fixed the configuration to contain path matching string in correct format.

We also started working on a fix that would validate the configurations before using them so that the invalid configurations get discarded in runtime.

Posted Jun 22, 2022 - 14:49 CEST

Resolved
Failing of API requests caused by human error in configuration of request limiting sub-system of API. This issue has been resolved, API is not returning 500 response codes anymore.
Posted Jun 22, 2022 - 00:00 CEST