In a distributed environment, calls to remote resources and services can fail due to transient faults, such as slow network connections and timeouts, or if resources are responding slowly or are temporarily unavailable. These faults are typically handled by the retry pattern.
However, there can also be situations where faults are due to unanticipated events that might take much longer to fix. These faults can range in severity from a partial loss of connectivity to the complete failure of a service. In these situations, it might be pointless for an application to continually retry an operation that’s unlikely to succeed. Retry could lead to cascading failures, such as resource exhaustion, non-responsiveness and growing latency. That’s not what you want. It would effectively make your service unavailable to handle other requests. To prevent these kind of faults, you can use the the circuit breaker pattern. The Circuit Breaker pattern prevents an application from performing an operation that’s likely to fail.
Related to the Circuit Breaker pattern, is the Health Check API. The Health Check API contains an operation (eg HTTP /health) that returns the health of the service. If you periodically invoke the operation, you can check the health of the service instance before actually calling it. Assuming that the health check is sufficiently comprehensive of course.
The API endpoint handler can perform one or more of the following checks:
- the status of the connections to the infrastructure services used by the service instance
- the status of the host, e.g. disk space
- application specific logic