Skip to content

epazote.yml ​

Epazote uses a YAML configuration file. Below are the available options:

The basic shape ​

Every service has the same high-level structure:

yaml
services:
  service-name:
    every: 30s
    url: http://127.0.0.1:8080/health
    expect:
      status: 200

Think of it like this:

  • every: how often to check
  • url or test: what to check
  • expect: what counts as healthy
  • if_not: what to do when the check is not healthy
yaml
services:
  example-service:
    every: 5m
    url: https://example.com
    method: GET
    follow_redirects: true
    max_bytes: 1024
    timeout: 10s
    expect:
      status: 200

Choose the right matcher ​

Use the simplest matcher that fits your service:

  • only the HTTP status matters: use expect.status
  • the response is text or HTML: use expect.body
  • the response is JSON: use expect.json
  • you need shell logic or external tools: use test

Examples:

yaml
services:
  http_status_only:
    url: http://127.0.0.1:8080/health
    every: 30s
    expect:
      status: 200

  plain_text_check:
    url: http://127.0.0.1:8080/health
    every: 30s
    expect:
      status: 200
      body: ok

  json_check:
    url: http://127.0.0.1:8429/api/v1/targets
    every: 30s
    expect:
      status: 200
      json:
        status: success

  shell_check:
    test: pgrep -x nginx
    every: 30s
    expect:
      status: 0

test and if_not.cmd are executed with the current shell from the SHELL environment variable, falling back to sh if it is not set. If you want to run a script, the most reliable option is to make it executable and give it a shebang such as #!/usr/bin/env bash or #!/bin/sh.

every: Specifies how often the service is checked. Supports s (seconds), m (minutes), h (hours), and d (days).

url: The URL to check. (test can be used instead of url to check the exit status of a command.)

method: The HTTP method to use when checking the URL. (default: GET)

INFO

You can use any of the following methods:

CONNECT
DELETE
GET
HEAD
OPTIONS
PATCH
POST
PUT
TRACE

follow_redirects: Follow HTTP redirects. (default: false)

max_bytes: The maximum number of bytes to read from the response. (default: No limit)

TIP

if you want to search for specific content in the response, you can use the max_bytes option to limit the number of bytes read from the response.

For example, if you want to search for the word "success" in the response, you can set max_bytes to a value that you know will contain the word "success":

yaml
services:
  example-service:
    every: 5m
    url: http://example.com
    max_bytes: 1024
    expect:
      status: 200
      body: success

if no max_bytes is set, the entire response will be read in chunks until the end of the response and stop reading when the word "success" is found.

timeout: The maximum time to wait for a response. (default: 5s)

Logging ​

By default, Epazote prints human-readable logs. If you prefer structured output, run it with --json-logs.

For HTTP checks in pretty mode:

  • healthy checks are logged as compact INFO entries
  • failed expectation checks are logged as WARN entries
  • response headers are shown only for failed HTTP checks

expect ​

expect defines expected responses from the service.

yaml
expect:
  status: 200
  body: "success"
  if_not:
    cmd: "sudo systemctl restart example-service"
  • status: Expected HTTP status code or when using test instead of URL the exit status code.

  • body: Expected response body using a plain substring match by default, or a raw regex when prefixed with r"...".

  • json: Expected response body parsed as JSON and matched structurally.

  • if_not: Actions to take if expectations fail.

INFO

expect.header is present in the config schema but response-header matching is not enforced yet. For now, use status, body, or json to validate responses.

if_not ​

if_not defines actions to take if the check fails

yaml
if_not:
  threshold: 3
  stop: 2
  cmd: "systemctl restart example-service"
  http: "http://alert-service/restart"
  • threshold: Number of consecutive failed checks required before the fallback action is executed. (default: 1)
  • stop: Number of times to run the cmd or http then it will not call the cmd or http.
  • cmd: Command to run if the check fails.
  • http: HTTP endpoint to call if the check fails.

threshold counts consecutive failures. A successful check resets the failure counter to 0.

stop is not a failure threshold. It only limits how many times Epazote will execute the fallback action after the threshold has been reached.

This is the easiest way to think about the two together:

  • threshold: when fallback starts
  • stop: when fallback stops

Example:

yaml
if_not:
  threshold: 3
  stop: 2
  cmd: systemctl restart example-service

With every: 30s, that means:

  1. first failed check: do nothing
  2. second failed check: do nothing
  3. third failed check: run the command
  4. fourth failed check: run the command
  5. fifth failed check: do not run the command anymore

Example with expect.json and if_not:

yaml
services:
  vmagent_targets:
    url: http://127.0.0.1:8429/api/v1/targets
    every: 30s
    expect:
      status: 200
      json:
        status: success
        data:
          activeTargets:
            - labels:
                job: DBMI-lab-nico
              health: up
      if_not:
        threshold: 3
        stop: 3
        cmd: systemctl restart vmagent

Environment variables for if_not.cmd ​

When Epazote runs if_not.cmd, it passes service context through EPAZOTE_* environment variables. This makes alert scripts easier to write without parsing log output.

Available variables:

  • EPAZOTE_SERVICE_NAME
  • EPAZOTE_SERVICE_TYPE (http or command)
  • EPAZOTE_URL for HTTP checks
  • EPAZOTE_TEST for command checks
  • EPAZOTE_EXPECTED_STATUS
  • EPAZOTE_ACTUAL_STATUS when available
  • EPAZOTE_ERROR
  • EPAZOTE_FAILURE_COUNT
  • EPAZOTE_THRESHOLD

Example:

yaml
services:
  vmagent_targets:
    url: http://127.0.0.1:8429/api/v1/targets
    every: 30s
    expect:
      status: 200
      json:
        status: success
      if_not:
        threshold: 3
        stop: 1
        cmd: /usr/local/bin/send-alert.sh

Example script:

bash
#!/usr/bin/env bash
set -euo pipefail

printf 'service=%s\n' "${EPAZOTE_SERVICE_NAME:-}"
printf 'type=%s\n' "${EPAZOTE_SERVICE_TYPE:-}"
printf 'error=%s\n' "${EPAZOTE_ERROR:-}"
printf 'expected=%s actual=%s\n' "${EPAZOTE_EXPECTED_STATUS:-}" "${EPAZOTE_ACTUAL_STATUS:-}"
printf 'failure_count=%s threshold=%s\n' "${EPAZOTE_FAILURE_COUNT:-}" "${EPAZOTE_THRESHOLD:-}"

Body options (json,form,text) ​

If you want to submit data using for example the POST method, you have three options:

  • json - Sends the data as JSON
  • form - Sends the data as a form
  • text - Sends the data as text

The headers are set automatically based on the body type, but can be changed if needed using the option headers.

Example submitting data as JSON:

yaml
services:
  example-service:
    every: 5m
    url: http://example.com
    method: POST
    body:
      json:
        key: value

Example submitting data as a form:

yaml
services:
  example-service:
    every: 5m
    url: http://example.com
    method: POST
    body:
      form:
        key: value

Example submitting data as text:

yaml
services:
  example-service:
    every: 5m
    url: http://example.com
    method: POST
    body: "Hello World!"
    headers:
      content-type: text/plain

TIP

You can override the default headers by adding a headers key.

For example in the case of sending a text body, you can set the content-type to text/plain, together with other custom headers:

yaml
    headers:
      content-type: text/plain
      X-Custom-Header: TestValue

Body regular expressions ​

You can match the body of the response in two ways.

Without the r"..." prefix, body is treated as plain text and matched as a substring. For example, to match the word "success" in the body:

yaml
services:
  example-service:
    every: 5m
    url: http://example.com
    expect:
      status: 200
      body: success

For more complex regular expressions, prefix the body with r"<your regex>":

yaml
services:
  example-service:
    every: 5m
    url: http://example.com
    expect:
      status: 200
      body: r"success|ok"

That means:

  • body: success checks whether the response contains the text success
  • body: r"success|ok" uses a raw regular expression

If the response is JSON, prefer expect.json over regex. It is easier to read and less fragile.

JSON response matching ​

Use expect.json when the response is JSON and you want structural matching instead of text matching:

yaml
services:
  vmagent_targets:
    url: http://127.0.0.1:8429/api/v1/targets
    every: 30s
    expect:
      status: 200
      json:
        status: success

Nested objects are matched recursively, so you can check only the fields you care about:

yaml
services:
  vmagent_targets:
    url: http://127.0.0.1:8429/api/v1/targets
    every: 30s
    expect:
      status: 200
      json:
        status: success
        data:
          activeTargets:
            - labels:
                job: DBMI-lab-nico
              health: up

Notes:

  • expect.body and expect.json are mutually exclusive
  • objects are matched as subsets, so extra fields in the response are allowed
  • array expectations match when each expected element matches at least one element in the actual response array
  • if_not works with expect.json the same way it works with expect.body
  • if_not.threshold defaults to 1, which preserves the previous behavior

Test command ​

Instead of using a URL, you can use the test key to check the exit status of a command:

yaml
services:
  example-service:
    every: 5m
    test: "pgrep -x httpd"
    expect:
      status: 0

test: is a shell command that will be executed status: is the expected exit status of the command

Epazote runs test with the current shell from SHELL, falling back to sh. For more complex logic, prefer calling an executable script:

yaml
services:
  example-service:
    every: 5m
    test: /usr/local/bin/check-httpd.sh
    expect:
      status: 0

It can be used also with if_not and perform actions if the command fails:

yaml
services:
  example-service:
    every: 5m
    test: pgrep -x httpd
    expect:
      status: 0
    if_not:
      cmd: sudo systemctl restart httpd

Released under the BSD-3-Clause License