Skip to main content

29 posts tagged with "Chaos Engineering"

View All Tags

· 3 min read

Chaos Mesh Q&A

At KubeCon EU 2022, the Chaos Mesh team hosted two activities "Make Cloud Native Chaos Engineering Easier - Deep Dive into Chaos Mesh" and "office hours session". We are very grateful and enjoyed it with all of you very much. We shared with each other, got to know each other, and discussed a lot of things in depth.

For the presentations, we gave a brief overview of Chaos Mesh, then delved into how Chaos Mesh is implemented and how it is practiced, and shared the team's latest explorations around chaos engineering and plans for Chaos Mesh's development.

For Office Hour, we introduced the Chaos Mesh project and its latest progress, and answered online questions from attendees.

Many thanks to each of our friends that came out to support us! And for Office Hour, we received some great questions and we decided to have a follow-up Q&A.

Your questions answered

Q: Does chaos play well with Windows/Linux hybrid clusters?

A: Chaos Mesh can only work with Linux now, but we have kindly contributors who are trying to port some features to Windows:

Q: I think Istio and Linkerd also support fault injection. How does Chaos Mesh differ? Chaos Mesh provides much richer chaos injections (like IOChaos, TimeChaos...), but the injection provided by linked or istio, as I know, is focused on the network?

A: Yeah of course! Service Mesh Frameworks have the potential to cause havoc in the RPC/Network layer. More types of chaos, such as stresschaos, pod kill, DNSChaos, and IOChaos, could be injected into Chaos Mesh (just mentioned) In addition to the list, we offer additional types of chaos. JVM, GCP, Azure, and so on...

Q: As part of the chaos mesh can we run any pre-initialization scripts before introducing the chaos experiment?

A: Yes! You may organize your customized scripts and various chaotic experiments together with Chaos Mesh's integrated Workflow engine. See task field in workflow for the document.

Q: Is this similar to the Gremlin Chaos engineering tool?

A: Yes, this is a Kubernetes-specific open-source project. It's a Kubernetes plugin that you can utilize. You can get more Infos on

Q: How does it inject network latency for network chaos? if we use cilium CNI with no iptables, would this latency injection still work in that case?

A: Chaos Mesh has a chaos-daemon component. When network chaos is produced, chaos-daemon will enter the target pod's network namespace and set TC and iptables rules on the network device.

When using clium CNI without iptables, Chaos Mesh still works.

Join the Chaos Mesh community

If you are interested in Chaos Mesh and would like to help us improve it, you're welcome to join our Slack channel or submit your pull requests or issues to our GitHub repository.

· 7 min read
Chunxu Zhang

Experience as an LFX Mentee for Chaos Mesh

I am a graduate student studying software engineering at Nanjing University. My research focuses on DevOps, which has intrinsic connections with chaos engineering and observability. To get involved in the open-source community, understand Kubernetes more deeply, and experience the daily jobs around infrastructure, I applied for the CNCF LFX Mentorship in Fall 2021 to work on the Chaos Mesh project.

· 7 min read
Lei Li

How to Develop a Daily Reporting System to Track Chaos Testing Results

Chaos Mesh is a cloud-native chaos engineering platform that orchestrates chaos experiments on Kubernetes environments. It allows you to test the resilience of your system by simulating problems such as network faults, file system faults, and Pod faults. After each chaos experiment, you can review the testing results by checking the logs. But this is neither direct nor efficient. Therefore, I decided to develop a daily reporting system that would automatically analyze logs and generate reports. This way, it’s easy to examine the logs and identify the issues.

· 6 min read
Ningxuan Wang

Chaos Mesh + SkyWalking: Better Observability for Chaos Engineering

Chaos Mesh is an open-source cloud-native chaos engineering platform. You can use Chaos Mesh to conveniently inject failures and simulate abnormalities that might occur in reality, so you can identify potential problems in your system. Chaos Mesh also offers a Chaos Dashboard which allows you to monitor the status of a chaos experiment. However, this dashboard cannot let you observe how the failures in the experiment impact the service performance of applications. This hinders us from further testing our systems and finding potential problems.

· 18 min read
Mayo Cream

Implementing Chaos Engineering in K8s

Chaos Mesh is an open-source, cloud-native Chaos Engineering platform built on Kubernetes (K8s) custom resource definitions (CRDs). Chaos Mesh can simulate various types of faults and has an enormous capability to orchestrate fault scenarios. You can use Chaos Mesh to conveniently simulate various abnormalities that might occur in development, testing, and production environments and find potential problems in the system.

· 4 min read
Xiang Wang

How to run chaos experiments on your physical machine

Chaos Mesh is a cloud-native Chaos Engineering platform that orchestrates chaos in Kubernetes environments. With Chaos Mesh, you can simulate a variety of failures, and use Chaos Dashboard, a web UI, to manage chaos experiments directly. Since it was open-sourced, Chaos Mesh has been adopted by many companies to ensure their systems’ resilience and robustness. But over the past year, we have frequently heard requests from the community asking how to run chaos experiments when the services are not deployed on Kubernetes.

· 6 min read
Shuyang Wu

Chaos Mesh helps Apache APISIX improve system stability

Apache APISIX is a cloud-native, high-performance, scaling microservices API gateway. It is one of the Apache Software Foundation's top-level projects and serves hundreds of companies around the world, processing their mission-critical traffic, including finance, the Internet, manufacturing, retail, and operators. Our customers include NASA, the European Union's digital factory, China Mobile, and Tencent.

· 11 min read
Yinghao Wang


Chaos Mesh includes the StressChaos tool, which allows you to inject CPU and memory stress into your Pod. This tool can be very useful when you test or benchmark a CPU-sensitive or memory-sensitive program and want to know its behavior under pressure.

However, as we tested and used StressChaos, we found some issues with usability and performance. For example, why does StressChaos use far less memory than we configured? To correct these issues, we developed a new set of tests. In this article, I'll describe how we troubleshooted these issues and corrected them. This information will enable you to get the most out of StressChaos.

· 5 min read
Debabrata Panigrahi

LFX Mentorship Experience

I’m a junior undergraduate majoring in Biomedical Engineering in the Department of Biotechnology and Medical Engineering at the National Institute of Technology Rourkela, India. For someone who started to code only because I was fascinated by it, it was all a journey of self-learning, filled with various adversities. But when I started with open-source contributions, it was all very beginner-friendly and I came across a lot of people who helped me learn the tech stack better.

· 9 min read
Keao Yang

Chaos Engineering - How to simulate I/O faults at runtime

In a production environment, filesystem faults might occur due to various incidents such as disk failures and administrator errors. As a Chaos Engineering platform, Chaos Mesh has supported simulating I/O faults in a filesystem ever since its early versions. By simply adding an IOChaos CustomResourceDefinition (CRD), we can watch how the filesystem fails and returns errors.

· 4 min read


NetEase Fuxi AI Lab is China’s first professional game AI research institution. Researchers use our Kubernetes-based Danlu platform for algorithm development, training and tuning, and online publishing. Thanks to the integration with Kubernetes, our platform is much more efficient. However, due to Kubernetes- and microservices-related issues, we are constantly testing and improving our platform to make it more stable.

· 3 min read


Chaos Mesh is proud to be in Hacktoberfest 2020!

Hosted by DigitalOcean, Intel and DEV, Hacktoberfest is an open source celebration open to everyone in our global community. This month-long (Oct 1 - Oct 31) event encourages everyone to help drive the growth of open source and make positive contributions to an ever-growing community, whether you’re an experienced developer or open-source newbie learning to code. As long as you submit 4 PRs before Oct 31, you are eligible to claim a limit edition T-shirt (70000 in total on a first-come-first-served basis)!

· 6 min read
Xiang Wang

chaos-mesh-action - Integrate Chaos Engineering into Your CI

Chaos Mesh is a cloud-native chaos testing platform that orchestrates chaos in Kubernetes environments. While it’s well received in the community with its rich fault injection types and easy-to-use dashboard, it was difficult to use Chaos Mesh with end-to-end testing or the continuous integration (CI) process. As a result, problems introduced during system development could not be discovered before the release.

In this article, I will share how we use chaos-mesh-action, a GitHub action to integrate Chaos Mesh into the CI process.

· 8 min read
Ben Ye, Chengwen Yin

TiPocket - Automated Testing Framework

Chaos Mesh is an open-source chaos engineering platform for Kubernetes. Although it provides rich capabilities to simulate abnormal system conditions, it still only solves a fraction of the Chaos Engineering puzzle. Besides fault injection, a full chaos engineering application consists of hypothesizing around defined steady states, running experiments in production, validating the system via test cases, and automating the testing.

This article describes how we use TiPocket, an automated testing framework to build a full Chaos Engineering testing loop for TiDB, our distributed database.

· 9 min read
Cwen Yin

Clock synchronization in distributed system

Chaos Mesh, an easy-to-use, open-source, cloud-native chaos engineering platform for Kubernetes (K8s), has a new feature, TimeChaos, which simulates the clock skew phenomenon. Usually, when we modify clocks in a container, we want a minimized blast radius, and we don't want the change to affect the other containers on the node. In reality, however, implementing this can be harder than you think. How does Chaos Mesh solve this problem?

· 6 min read
Cwen Yin

Run your first chaos experiment in 10 minutes

Chaos Engineering is a way to test a production software system's robustness by simulating unusual or disruptive conditions. For many people, however, the transition from learning Chaos Engineering to practicing it on their own systems is daunting. It sounds like one of those big ideas that require a fully-equipped team to plan ahead. Well, it doesn't have to be. To get started with chaos experimenting, you may be just one suitable platform away.

· 11 min read
Cwen Yin

Chaos Engineering

Why Chaos Mesh?

In the world of distributed computing, faults can happen to your clusters unpredictably any time, anywhere. Traditionally we have unit tests and integration tests that guarantee a system is production ready, but these cover just the tip of the iceberg as clusters scale, complexities amount, and data volumes increase by PB levels. To better identify system vulnerabilities and improve resilience, Netflix invented Chaos Monkey and injects various types of faults into the infrastructure and business systems. This is how Chaos Engineering was originated.