When a company acquires crucial software, it has or should have an interest in concluding a service agreement with the supplier, guaranteeing the availability of the required service. And the most critical moments for service occur when the delivered solution is "not working."
Therefore, a key element of SLA (Service Level Agreement) is the categorization of incidents based on their urgency. For software, the "critical failure" category usually defines a state of complete software outage or the failure of its critical components. In other words, a situation where the software cannot be used for its intended purpose.
This might seem clear, but at least in the case of data integrations, the situation is somewhat more complicated. The reason is that an integration service connects systems, and each of the integrated systems can be the cause of a state that, at first glance, appears to be an error or even a failure of the integration service.
There are many examples, let me mention one from practice, where a warehouse system generated changes on tens of thousands of products in bulk and automatically sent these events to a process that synchronized the changes with an e-commerce platform. The problem arose because the integrated store has strict limits on the number of API calls. The synchronization, therefore, took several hours and blocked other calls to the store, resulting in temporary unsynchronized orders and inventory statuses.
However, no actual outage occurred. The integration service behaved completely correctly as it should have. Nevertheless, for the store's operations, this situation clearly seemed like an emergency.
When we talk about integration service, we must ask what answers will be relevant to make the right decisions. As we have shown, service malfunction is far from the only situation that can cause complications in a company's operations. Non-functional software will probably represent a negligible percentage of critical situations.
Therefore, when setting SLA with an integration supplier, we recommend addressing each process individually and examining its importance, costs, or losses in case of its malfunction. It is also crucial to consider technical measures within key integration processes that eliminate dangerous situations. Integrations should not just be about connecting systems but primarily about ensuring flawless automation.