Authors
Michal Artazov, Backend Team Leader
Summary
Failing of API requests caused by human error in configuration of request limiting sub-system of API.
Impact
Some API requests started unexpectedly failing with response code 500.
Trigger
Human error.
Detection
We combine Pingdom service and Postman Monitoring feature to monitor health of API endpoints. We were alerted about large amounts of 500 response codes as soon as it started happening.
Root Causes
Recently we added new functionality to API to be able to limit the number of requests each organization can do on certain endpoints. The goal of this functionality was to prevent misuse of the API and sending excessive amounts of requests that can degrade API’s performance for other clients.
At first we only applied it to two endpoints and it worked well. Today we decided to apply it to two more endpoints - Upload Applet Version Files and Upload and Update Applet Version Files.
To configure a limit for an endpoint, we match the endpoint with method and a path matching string in the format of express.js library. We use npm library path-to-regexp to match request’s path against the configurations.
A human error caused misconfiguration and configuring an invalid path matching string. Any time it would attempt to get matched against the request’s path, it would throw an exception, causing the request to fail with 500 response code.
Remediation
We fixed the configuration to contain path matching string in correct format.
We also started working on a fix that would validate the configurations before using them so that the invalid configurations get discarded in runtime.