Status Check in Workflow
In Workflow, the status check could execute specified operations on external systems, such as application systems and monitoring systems, to obtain their statuses, and automatically abort the Workflow when it finds the system is unhealthy. The concept is similar to Container Probes
in Kubernetes. This article describes how to execute status checks in Workflow using YAML files.
Chaos Mesh does not yet support creating StatusCheck
nodes on Chaos Dashboard, so you could only create StatusCheck
nodes using YAML for now.
Status Check type
Chaos Mesh only support the HTTP
type to execute a status check.
Define a HTTP
StatusCheck
node
A StatusCheck
node sends GET
or POST
HTTP requests to the specific URL, with custom headers and body, and then determines the result of the request by the conditions in the criteria
field.
- name: workflow-status-check
templateType: StatusCheck
deadline: 20s
statusCheck:
mode: Continuous
type: HTTP
intervalSeconds: 1
timeoutSeconds: 1
http:
url: http://123.123.123.123
method: GET
criteria:
statusCode: '200'
In the configuration, you can see a StatusCheck
node with HTTP
type. The deadline
field specifies that this node could be executed for a maximum of 20 seconds. The mode
field specifies that this node will execute status checks continuously. The intervalSeconds
field specifies a repetition interval of 1 second. The timeoutSeconds
field specifies the timeout for each execution.
When Workflow runs to this StatusCheck
node, the specified status check would be executed every second. The status check uses the GET
method to send an HTTP request to the URL http://123.123.123.123
. If the response is returned within 1 second and the status code is 200
, this execution succeeds, otherwise it fails.
Status Check results
Each execution of the status check will get an execution result
, either Success
or Failure
. Because a single execution result
may not reflect the real situation of the system, due to fluctuations in certain conditions, the final status check result
is not determined based on a single execution result
.
The StatusCheck
node has failureThreshold
and successThreshold
two fields:
- When the number of consecutive failed
execution results
exceeds thefailureThreshold
, thestatus check result
is considered to be aFailure
. - When the number of consecutive successful
execution results
exceeds thesuccessThreshold
, thestatus check result
is considered to be aSuccess
.
- name: workflow-status-check
templateType: StatusCheck
deadline: 20s
statusCheck:
mode: Continuous
type: HTTP
successThreshold: 1
failureThreshold: 3
http:
url: http://123.123.123.123
method: GET
criteria:
statusCode: '200'
In the configuration, the StatusCheck
node will execute status checks continuously:
- When the
execution result
isSuccess
for 1 or more consecutive times, thestatus check result
is considered to be aSuccess
. - When the
execution result
isFailure
for 3 or more consecutive times, thestatus check result
is considered to be aFailure
.
In the following sections, status check fails
refers to that status check result
is Failure
, rather than a single execution result
is Failure
.
When the Status Check is unsuccessful, abort the Workflow
The StatusCheck
node only supports aborting the workflow automatically when the status check is unsuccessful. It could not pause or resume the workflow.
When executing chaos experiments, the application system might become unhealthy
, this function can be used to restore the application system by quickly ending chaos experiments. To enable the workflow to abort automatically when the status check fails, you can set the abortWithStatusCheck
field to true
on the StatusCheck
node.
- name: workflow-status-check
templateType: StatusCheck
deadline: 20s
abortWithStatusCheck: true
statusCheck:
mode: Continuous
type: HTTP
http:
url: http://123.123.123.123
method: GET
criteria:
statusCode: '200'
The status check is considered unsuccessful when any of the following conditions are met:
- The status check fails.
- When the
StatusCheck
node timeout is exceeded, and thestatus check result
is not successful. For example,successThreshold
is 1,failureThreshold
is 3, and when the timeout is exceeded, there are 2 consecutive failures and 0 successes. Although it does not meet the condition for "status check fails", it is also considered to be unsuccessful in this case.
Status Check mode
Continuous Status Check
When the mode
field is Continuous
, it means this StatusCheck
node will execute status checks continuously until the node times out or the status check fails.
- name: workflow-status-check
templateType: StatusCheck
deadline: 20s
statusCheck:
mode: Continuous
type: HTTP
intervalSeconds: 1
successThreshold: 1
failureThreshold: 3
http:
url: http://123.123.123.123
method: GET
criteria:
statusCode: '200'
In the configuration, the StatusCheck
node will execute status checks every second, and exit when any of the following conditions are met:
- The status check fails, i.e. 3 or more consecutive failed
execution results
- Trigger the node timeout after 20 seconds
One time Status Check
When the mode
field is Synchronous
, it means that this StatusCheck
node will exit immediately when the status check result
is clear, or when the node times out.
- name: workflow-status-check
templateType: StatusCheck
deadline: 20s
statusCheck:
mode: Synchronous
type: HTTP
intervalSeconds: 1
successThreshold: 1
failureThreshold: 3
http:
url: http://123.123.123.123
method: GET
criteria:
statusCode: '200'
In the configuration, the StatusCheck
node will execute status checks every second, and exit when any of the following conditions are met:
- The status check succeeds, i.e. 1 or more consecutive successful
execution results
- The status check fails, i.e. 3 or more consecutive failed
execution results
- Trigger the node timeout after 20 seconds
StatusCheck vs HTTP Request Task
Similarities:
- The
StatusCheck
node and theHTTP Request Task
node (theTask
node that executes HTTP requests) are a node type of Workflow. - The
StatusCheck
node and theHTTP Request Task
node can obtain information about external systems through HTTP requests.
Differences:
- The
HTTP Request Task
node could only send an HTTP once, and cannot send HTTP requests continuously. - The
HTTP Request Task
node cannot affect the status of the workflow when the request fails, such as aborting the workflow.
Field description
For more information about Workflow and Template, refer to Create Chaos Mesh Workflow.
StatusCheck field description
Parameter | Type | Description | Default value | Required | Example |
---|---|---|---|---|---|
mode | string | The execution mode of the status check. Support value: Synchronous /Continuous . | None | Yes | Synchronous |
type | string | The type of the status check. Support value: HTTP . | HTTP | Yes | HTTP |
duration | string | The duration of the whole status check if the number of failed execution does not exceed the failureThreshold . It is available in both Synchronous and Continuous modes. | None | No | 100s |
timeoutSeconds | int | The timeout seconds when the status check fails. | 1 | No | 1 |
intervalSeconds | int | Defines how often (in seconds) to perform an execution of status check. | 1 | No | 1 |
failureThreshold | int | The minimum consecutive failure for the status check to be considered failed. | 3 | No | 3 |
successThreshold | int | The minimum consecutive successes for the status check to be considered successful. | 1 | No | 1 |
recordsHistoryLimit | int | The number of records to retain. | 100 | No | 100 |
http | HTTPStatusCheck | Configure the detail of the HTTP request to execute. | None | No |
HTTPStatusCheck field description
Parameter | Type | Description | Default value | Required | Example |
---|---|---|---|---|---|
url | string | The URL of the HTTP request. | None | Yes | http://123.123.123.123 |
method | string | The method of the HTTP request. Support value: GET /POST . | GET | No | GET |
headers | map[string][]string | The headers of the HTTP request. | None | No | |
body | string | The body of the HTTP request. | None | No | {"a":"b"} |
criteria | HTTPCriteria | Defines how to determine the result of the HTTP StatusCheck. | None | Yes |
HTTPCriteria field description
Parameter | Type | Description | Default value | Required | Example |
---|---|---|---|---|---|
statusCode | string | The expected http status code for the request. A statusCode string could be a single code (e.g. 200 ), or an inclusive range (e.g. 200-400 , both 200 and 400 are included). | None | Yes | 200 |