Suppose your service or information system fully complies with the principles of Cloud Native applications described above. You have chosen the right provider and made the transition to the cloud, taking into account all the recommendations. Does this mean that now nothing threatens your services, and the possibility of their inaccessibility is completely excluded? Unfortunately no. Even after transferring an application to the cloud, certain risks remain, which can be conditionally divided into three groups.
Even the most reliable cloud provider is not immune from emergencies, be it natural disasters or man-made accidents. Fires, floods, and even a banal cable breakdown are just an incomplete list of things that can permanently disable data centers, and in the worst case, destroy them.
Many people remember the fire in the Moscow data center of the OST provider DataLine in June 2019. But then they managed to get by with minor casualties – all clients were promptly transferred to backup sites, and there was virtually no damage to the turbine room: only the roof and the air conditioning system suffered.
The recent fire in the SBG2 data center of the OVH provider in Strasbourg led to much greater losses, which resulted in the fall of many services around the world, the destruction of one data center (SBG2), and the forced loss of the second data center, located nearby and partially damaged by the fire (SBG1). This is precisely the essence of geographic risks: when a provider’s data centers are geographically located close to each other, in the event of a disaster, they do not insure each other, and all are under threat.
Therefore, when choosing a provider, be sure to pay attention to two points:
- The provider has several data centers.
- Data centers are located at a sufficient distance from each other; if possible, they use different communication channels and Internet providers powered by different power plants.
This includes various political and legislative decisions that may unexpectedly require a change of provider. The most striking example of recent years is the ban of the Parler social network in January 2021, when Amazon denied this platform to further store data, and the application became unavailable. In Russia, one can recall 152-FZ, which prohibits the storage of users’ data outside the Russian Federation, which automatically limits the choice of providers for specific organizations (banking sector, medical organizations, and so on).
The peculiarity of Government risks is that the release of bills and other political decisions, as a rule, requires a rapid response, so it is essential always to be prepared for an unplanned change of provider.
ISP Failure Risks
These are technical failures at the provider level as a whole, most often associated with a human factor, such as the release of incorrect updates, errors in predicting consumed resources, etc. Even well-known cloud leaders like Amazon and Google regularly experience service outages. For example, there have been over 100 incidents in Google Cloud over the past year, while AWS crashes at least once a year. For example, there was a major outage on the AWS side in November 2020 that caused various sites and applications to malfunction, including iRobot, Flickr, Roku, and Adobe Spark.
Obviously, even the most reliable cloud solutions are not protected from human error and cannot guarantee one hundred percent availability of their services. Of course, the provider must maintain the stated SLA level and compensate for losses in case of violation, but this is unlikely to compensate for the loss of your time and potential customers in the event of long-term disruptions.
Thus, we have covered three types of risks you can face after moving your applications to the cloud. But suppose the first group of risks is easily eliminated by choosing a provider with geographically dispersed data centers. In that case, risks from the second and third groups, in my opinion, can be eliminated only by applying the Multicloud Native Service approach.