Member-only story
Chaos Engineering: Install LitmusChaos, Prometheus and Graphana
Chaos engineering is a methodology that facilitates experimenting on a system in order to enhance its capability to withstand complex failure conditions in production.
Application’s resilience depends more on the underlying stack than the application itself. It is possible that once the application is stabilised, the resilience of the service that runs on Kubernetes depends on other components and infrastructure more than 90% of the time.
The process of continuously verifying if the service is resilient against faults” is called Chaos Engineering.
A common way to introduce chaos is to deliberately inject faults that cause system components to fail. The goal is to observe, monitor, respond to, and improve system’s reliability under adverse circumstances. For example, taking dependencies offline (stopping API apps, shutting down VMs, etc.), restricting access (enabling firewall rules, changing connection strings, etc.), or forcing failover (database level, Front Door, etc.), is a good way to validate that the application is able to handle faults gracefully.