Pete's Log: Misadventures in API Design

Entry #2177, (Coding, Hacking, & CS stuff, Work)
(posted when I was 44 years old.)

I've dealt with many baffling REST APIs in my day, but one I've had to deal with for work recently has really impressed me in its bafflingness.

The API in question is used to retrieve events. There are about 2000-3000 events a day. To retrieve events, one submits a GET request to a specific endpoint. If there are events available, the endpoint will respond with up to 100 events. The response does not include any metadata such as the total number of events. Instead, if 100 events were returned, then there may be more events available and you should hit the endpoint again. There aren't any parameters to set indicating you want the second (or nth) page of events. Instead, any events the endpoint has already returned to you are now gone as far as the service is concerned.

Setting aside for a moment my purist concerns that GET methods should not alter the service state, there are some real practical issues with this API.

  • Since this vendor does not offer a test endpoint, testing any changes before deploying is basically impossible, since any request to the endpoint will "steal" events from the production job that retrieves these events. (I suppose I could set up a local service that attempts to emulate their API, but that would probably miss edge cases)
  • Any problems (even transient ones) between data retrieval and data storage will result in a loss of data.

I don't know how this service is actually implemented on the backend. Maybe there's a way to get back data that was lost. But a conversation with one of their engineers did not inspire hope. He asked that we please make sure to hit the endpoint at least once an hour so not too much data backs up on their end.

At least this data isn't mission critical for us.