epazote.yml ​
Epazote uses a YAML configuration file. Below are the available options:
The basic shape ​
Every service has the same high-level structure:
services:
service-name:
every: 30s
url: http://127.0.0.1:8080/health
expect:
status: 200Think of it like this:
every: how often to checkurlortest: what to checkexpect: what counts as healthyif_not: what to do when the check is not healthy
services:
example-service:
every: 5m
url: https://example.com
method: GET
follow_redirects: true
max_bytes: 1024
timeout: 10s
expect:
status: 200Choose the right matcher ​
Use the simplest matcher that fits your service:
- only the HTTP status matters: use
expect.status - the response is text or HTML: use
expect.body - the response is JSON: use
expect.json - you need shell logic or external tools: use
test
Examples:
services:
http_status_only:
url: http://127.0.0.1:8080/health
every: 30s
expect:
status: 200
plain_text_check:
url: http://127.0.0.1:8080/health
every: 30s
expect:
status: 200
body: ok
json_check:
url: http://127.0.0.1:8429/api/v1/targets
every: 30s
expect:
status: 200
json:
status: success
shell_check:
test: pgrep -x nginx
every: 30s
expect:
status: 0test and if_not.cmd are executed with the current shell from the SHELL environment variable, falling back to sh if it is not set. If you want to run a script, the most reliable option is to make it executable and give it a shebang such as #!/usr/bin/env bash or #!/bin/sh.
every: Specifies how often the service is checked. Supports s (seconds), m (minutes), h (hours), and d (days).
url: The URL to check. (test can be used instead of url to check the exit status of a command.)
method: The HTTP method to use when checking the URL. (default: GET)
INFO
You can use any of the following methods:
CONNECT
DELETE
GET
HEAD
OPTIONS
PATCH
POST
PUT
TRACE
follow_redirects: Follow HTTP redirects. (default: false)
max_bytes: The maximum number of bytes to read from the response. (default: No limit)
TIP
if you want to search for specific content in the response, you can use the max_bytes option to limit the number of bytes read from the response.
For example, if you want to search for the word "success" in the response, you can set max_bytes to a value that you know will contain the word "success":
services:
example-service:
every: 5m
url: http://example.com
max_bytes: 1024
expect:
status: 200
body: successif no max_bytes is set, the entire response will be read in chunks until the end of the response and stop reading when the word "success" is found.
timeout: The maximum time to wait for a response. (default: 5s)
Logging ​
By default, Epazote prints human-readable logs. If you prefer structured output, run it with --json-logs.
For HTTP checks in pretty mode:
- healthy checks are logged as compact
INFOentries - failed expectation checks are logged as
WARNentries - response headers are shown only for failed HTTP checks
expect ​
expect defines expected responses from the service.
expect:
status: 200
body: "success"
if_not:
cmd: "sudo systemctl restart example-service"status: Expected HTTP status code or when usingtestinstead ofURLthe exit status code.body: Expected response body using a plain substring match by default, or a raw regex when prefixed withr"...".json: Expected response body parsed as JSON and matched structurally.if_not: Actions to take if expectations fail.
INFO
expect.header is present in the config schema but response-header matching is not enforced yet. For now, use status, body, or json to validate responses.
if_not ​
if_not defines actions to take if the check fails
if_not:
threshold: 3
stop: 2
cmd: "systemctl restart example-service"
http: "http://alert-service/restart"threshold: Number of consecutive failed checks required before the fallback action is executed. (default: 1)stop: Number of times to run the cmd or http then it will not call the cmd or http.cmd: Command to run if the check fails.http: HTTP endpoint to call if the check fails.
threshold counts consecutive failures. A successful check resets the failure counter to 0.
stop is not a failure threshold. It only limits how many times Epazote will execute the fallback action after the threshold has been reached.
This is the easiest way to think about the two together:
threshold: when fallback startsstop: when fallback stops
Example:
if_not:
threshold: 3
stop: 2
cmd: systemctl restart example-serviceWith every: 30s, that means:
- first failed check: do nothing
- second failed check: do nothing
- third failed check: run the command
- fourth failed check: run the command
- fifth failed check: do not run the command anymore
Example with expect.json and if_not:
services:
vmagent_targets:
url: http://127.0.0.1:8429/api/v1/targets
every: 30s
expect:
status: 200
json:
status: success
data:
activeTargets:
- labels:
job: DBMI-lab-nico
health: up
if_not:
threshold: 3
stop: 3
cmd: systemctl restart vmagentEnvironment variables for if_not.cmd ​
When Epazote runs if_not.cmd, it passes service context through EPAZOTE_* environment variables. This makes alert scripts easier to write without parsing log output.
Available variables:
EPAZOTE_SERVICE_NAMEEPAZOTE_SERVICE_TYPE(httporcommand)EPAZOTE_URLfor HTTP checksEPAZOTE_TESTfor command checksEPAZOTE_EXPECTED_STATUSEPAZOTE_ACTUAL_STATUSwhen availableEPAZOTE_ERROREPAZOTE_FAILURE_COUNTEPAZOTE_THRESHOLD
Example:
services:
vmagent_targets:
url: http://127.0.0.1:8429/api/v1/targets
every: 30s
expect:
status: 200
json:
status: success
if_not:
threshold: 3
stop: 1
cmd: /usr/local/bin/send-alert.shExample script:
#!/usr/bin/env bash
set -euo pipefail
printf 'service=%s\n' "${EPAZOTE_SERVICE_NAME:-}"
printf 'type=%s\n' "${EPAZOTE_SERVICE_TYPE:-}"
printf 'error=%s\n' "${EPAZOTE_ERROR:-}"
printf 'expected=%s actual=%s\n' "${EPAZOTE_EXPECTED_STATUS:-}" "${EPAZOTE_ACTUAL_STATUS:-}"
printf 'failure_count=%s threshold=%s\n' "${EPAZOTE_FAILURE_COUNT:-}" "${EPAZOTE_THRESHOLD:-}"Body options (json,form,text) ​
If you want to submit data using for example the POST method, you have three options:
json- Sends the data as JSONform- Sends the data as a formtext- Sends the data as text
The headers are set automatically based on the body type, but can be changed if needed using the option
headers.
Example submitting data as JSON:
services:
example-service:
every: 5m
url: http://example.com
method: POST
body:
json:
key: valueExample submitting data as a form:
services:
example-service:
every: 5m
url: http://example.com
method: POST
body:
form:
key: valueExample submitting data as text:
services:
example-service:
every: 5m
url: http://example.com
method: POST
body: "Hello World!"
headers:
content-type: text/plainTIP
You can override the default headers by adding a headers key.
For example in the case of sending a text body, you can set the content-type to text/plain, together with other custom headers:
headers:
content-type: text/plain
X-Custom-Header: TestValueBody regular expressions ​
You can match the body of the response in two ways.
Without the r"..." prefix, body is treated as plain text and matched as a substring. For example, to match the word "success" in the body:
services:
example-service:
every: 5m
url: http://example.com
expect:
status: 200
body: successFor more complex regular expressions, prefix the body with r"<your regex>":
services:
example-service:
every: 5m
url: http://example.com
expect:
status: 200
body: r"success|ok"That means:
body: successchecks whether the response contains the textsuccessbody: r"success|ok"uses a raw regular expression
If the response is JSON, prefer expect.json over regex. It is easier to read and less fragile.
JSON response matching ​
Use expect.json when the response is JSON and you want structural matching instead of text matching:
services:
vmagent_targets:
url: http://127.0.0.1:8429/api/v1/targets
every: 30s
expect:
status: 200
json:
status: successNested objects are matched recursively, so you can check only the fields you care about:
services:
vmagent_targets:
url: http://127.0.0.1:8429/api/v1/targets
every: 30s
expect:
status: 200
json:
status: success
data:
activeTargets:
- labels:
job: DBMI-lab-nico
health: upNotes:
expect.bodyandexpect.jsonare mutually exclusive- objects are matched as subsets, so extra fields in the response are allowed
- array expectations match when each expected element matches at least one element in the actual response array
if_notworks withexpect.jsonthe same way it works withexpect.bodyif_not.thresholddefaults to1, which preserves the previous behavior
Test command ​
Instead of using a URL, you can use the test key to check the exit status of a command:
services:
example-service:
every: 5m
test: "pgrep -x httpd"
expect:
status: 0test: is a shell command that will be executed status: is the expected exit status of the command
Epazote runs test with the current shell from SHELL, falling back to sh. For more complex logic, prefer calling an executable script:
services:
example-service:
every: 5m
test: /usr/local/bin/check-httpd.sh
expect:
status: 0It can be used also with if_not and perform actions if the command fails:
services:
example-service:
every: 5m
test: pgrep -x httpd
expect:
status: 0
if_not:
cmd: sudo systemctl restart httpd