hhersch
6 years agoQuickbase Staff
API Communications - Best Practices
Dialing... busy... retrying... dialing...
Remember the days of dial-up internet? The ever-so-common scenario where you heard a busy signal. As frustrating as this was, the great part was your modem was smart enough to retry the connection - which usually resulted in success. It was only considered a failure after a certain number of attempts.
While the days of dial-up might be over, the days of communicating over the internet and having communication issues aren't and they probably never will be. Internet communications are blisteringly fast these days but there are more failures than you might think. The internet is a complex web of connected resources. Sometimes those connections break down. These are commonly referred to as transient faults simply a fault that is temporary and probably resolved by the next time the request is tried. In fact, this probably happens to you all the time when browsing the web but the web browser is smart enough to handle this without you even knowing.
Why is this important? We know that many of our builders enhance their Quick Base applications with integrations, by leveraging our API. While we are extremely proud of our uptime, virtually no service can guarantee 100% reliability on all requests made to the platform. This is especially true since web requests can break down before they even hit Quick Base, for any number of reasons. This could be as simple as a network router having a brief bottleneck. To alleviate the impact of these transient faults, it is best practice (for all integrations and web service calls, not only Quick Base) to implement API retries with exponential backoffs. In simple terms, this basically means trying the request again, waiting a certain amount of time, and trying again. In most cases, this will have a successful outcome. Here is a helpful article from Microsoft that overviews this type of mechanism in detail.
There are a few important considerations here:
1. We usually do not want to retry an API call where the outcome is unlikely to change with time. For example, Quick Base might return an HTTP 400 (bad request) when a required field is missing. This scenario should not be retried, but rather healthily exited and logged. The same can be said for an authentication error.
2. In the uncommon scenario where Quick Base, or any software application, has a legitimate service issue or if an environment is being throttled, etc., an integration should be cautious to not accidentally cause a denial-of-service (DOS) attack. This would inadvertently occur if the retries were occurring too rapidly. That is why the concept of exponential backoffs is so crucial. While there are many sophisticated approaches to timing of the retries, the below is a very simple example that illustrates how we ensure appropriate time is given in between each attempt.
One of the great things here is that leveraging modern toolkits and platforms, a lot of this heavy lifting is done for you. For example, Zapier has an Autoreplay feature. This feature intelligently replays tasks that failed to any web service, based on a non-HTTP 200. Zapier serves many different endpoints - over 1,000 stock connectors as of this writing, plus Webhooks by Zapier which can connect to almost any web service. Because of this, Zapier doesn't differentiate between transient faults and other business-related failures, like mentioned in #1. That's okay... In fact, it might give someone the opportunity to change something inside of the source system so that the connection succeeds on the next attempt. If you are building something yourself in .NET, JavaScript, etc., you can best judge the logic that makes sense in your business.
The general take away and concept here is that all solutions that connect to a web service out on the internet should be mindful of transient faults and ensure their solutions are tolerant of these. Putting this work in up-front can help reduce support overhead, chasing down errors that do not need to be chased and ensuring the solution is built for scale.
Remember the days of dial-up internet? The ever-so-common scenario where you heard a busy signal. As frustrating as this was, the great part was your modem was smart enough to retry the connection - which usually resulted in success. It was only considered a failure after a certain number of attempts.
While the days of dial-up might be over, the days of communicating over the internet and having communication issues aren't and they probably never will be. Internet communications are blisteringly fast these days but there are more failures than you might think. The internet is a complex web of connected resources. Sometimes those connections break down. These are commonly referred to as transient faults simply a fault that is temporary and probably resolved by the next time the request is tried. In fact, this probably happens to you all the time when browsing the web but the web browser is smart enough to handle this without you even knowing.
Why is this important? We know that many of our builders enhance their Quick Base applications with integrations, by leveraging our API. While we are extremely proud of our uptime, virtually no service can guarantee 100% reliability on all requests made to the platform. This is especially true since web requests can break down before they even hit Quick Base, for any number of reasons. This could be as simple as a network router having a brief bottleneck. To alleviate the impact of these transient faults, it is best practice (for all integrations and web service calls, not only Quick Base) to implement API retries with exponential backoffs. In simple terms, this basically means trying the request again, waiting a certain amount of time, and trying again. In most cases, this will have a successful outcome. Here is a helpful article from Microsoft that overviews this type of mechanism in detail.
There are a few important considerations here:
1. We usually do not want to retry an API call where the outcome is unlikely to change with time. For example, Quick Base might return an HTTP 400 (bad request) when a required field is missing. This scenario should not be retried, but rather healthily exited and logged. The same can be said for an authentication error.
2. In the uncommon scenario where Quick Base, or any software application, has a legitimate service issue or if an environment is being throttled, etc., an integration should be cautious to not accidentally cause a denial-of-service (DOS) attack. This would inadvertently occur if the retries were occurring too rapidly. That is why the concept of exponential backoffs is so crucial. While there are many sophisticated approaches to timing of the retries, the below is a very simple example that illustrates how we ensure appropriate time is given in between each attempt.
One of the great things here is that leveraging modern toolkits and platforms, a lot of this heavy lifting is done for you. For example, Zapier has an Autoreplay feature. This feature intelligently replays tasks that failed to any web service, based on a non-HTTP 200. Zapier serves many different endpoints - over 1,000 stock connectors as of this writing, plus Webhooks by Zapier which can connect to almost any web service. Because of this, Zapier doesn't differentiate between transient faults and other business-related failures, like mentioned in #1. That's okay... In fact, it might give someone the opportunity to change something inside of the source system so that the connection succeeds on the next attempt. If you are building something yourself in .NET, JavaScript, etc., you can best judge the logic that makes sense in your business.
The general take away and concept here is that all solutions that connect to a web service out on the internet should be mindful of transient faults and ensure their solutions are tolerant of these. Putting this work in up-front can help reduce support overhead, chasing down errors that do not need to be chased and ensuring the solution is built for scale.