Skip to content

Create the configuration file ​

When running epazote it defaults to the epazote.yml configuration file, you can specify a different file using the -c flag.

bash
$ epazote -c /path/to/epazote.yml

Basic configuration ​

The configuration file is a YAML file that contains the services you want to monitor, here is an example:

yaml
---
services:
  app:
    url: http://0.0.0.0:8080
    every: 1m
    expect:
      status: 200
      if_not:
        cmd: systemctl restart app

In this example we are monitoring a service called app that runs on http://0.0.0.0:8080, we are checking the status code every minute and if the status code is not 200 we restart the service using systemctl restart app.

Mental model ​

For each service, Epazote does the same loop:

  1. Wait every
  2. Run either an HTTP check with url or a shell check with test
  3. Compare the result with expect
  4. If the check fails, run if_not if it is configured

In practice, most services fit one of these three patterns:

1. Check only the HTTP status code ​

yaml
services:
  app:
    url: http://127.0.0.1:8080/health
    every: 30s
    expect:
      status: 200

2. Check a JSON API ​

yaml
services:
  vmagent_targets:
    url: http://127.0.0.1:8429/api/v1/targets
    every: 30s
    expect:
      status: 200
      json:
        status: success

3. Check a command exit code ​

yaml
services:
  nginx_process:
    test: pgrep -x nginx
    every: 30s
    expect:
      status: 0

If the endpoint returns JSON and you want to match fields instead of raw text, use expect.json:

yaml
---
services:
  vmagent_targets:
    url: http://127.0.0.1:8429/api/v1/targets
    every: 30s
    expect:
      status: 200
      json:
        status: success

If you want a fallback action, add if_not:

yaml
services:
  vmagent_targets:
    url: http://127.0.0.1:8429/api/v1/targets
    every: 30s
    expect:
      status: 200
      json:
        status: success
      if_not:
        threshold: 3
        stop: 2
        cmd: systemctl restart vmagent

That means:

  • wait for 3 consecutive failures before running the command
  • after that, run the command at most 2 times

If you use a script in if_not.cmd, Epazote also exports EPAZOTE_* environment variables such as EPAZOTE_SERVICE_NAME, EPAZOTE_ERROR, EPAZOTE_FAILURE_COUNT, and EPAZOTE_THRESHOLD. That is the easiest way to build alert scripts without parsing logs.

run epazote ​

Within the same directory as the epazote.yml file you can run epazote:

bash
$ epazote -v

-v flag is for verbose output

By default, Epazote prints human-readable logs. Use --json-logs if you want structured JSON logs instead.

For HTTP checks in pretty mode:

  • healthy checks are logged as compact INFO entries
  • failed expectation checks are logged as WARN entries
  • response headers are shown only for failed HTTP checks

Metrics ​

After running epazote you can access the metrics at http://0.0.0.0:9080/metrics

you can change the port using the -p flag

bash
$ curl 0:9080/metrics

Output example:

text
# HELP epazote_response_time_seconds Service response time in seconds
# TYPE epazote_response_time_seconds histogram
epazote_response_time_seconds_bucket{service_name="app",le="0.005"} 1
epazote_response_time_seconds_bucket{service_name="app",le="0.01"} 1
epazote_response_time_seconds_bucket{service_name="app",le="0.025"} 1
epazote_response_time_seconds_bucket{service_name="app",le="0.05"} 1
epazote_response_time_seconds_bucket{service_name="app",le="0.1"} 1
epazote_response_time_seconds_bucket{service_name="app",le="0.25"} 1
epazote_response_time_seconds_bucket{service_name="app",le="0.5"} 1
epazote_response_time_seconds_bucket{service_name="app",le="1"} 1
epazote_response_time_seconds_bucket{service_name="app",le="2.5"} 1
epazote_response_time_seconds_bucket{service_name="app",le="5"} 1
epazote_response_time_seconds_bucket{service_name="app",le="10"} 1
epazote_response_time_seconds_bucket{service_name="app",le="+Inf"} 1
epazote_response_time_seconds_sum{service_name="app"} 0.000298415
epazote_response_time_seconds_count{service_name="app"} 1
# HELP epazote_status Service status (1 = OK, 0 = FAIL)
# TYPE epazote_status gauge
epazote_status{service_name="app"} 1

Released under the BSD-3-Clause License