Create the configuration file ​
When running epazote it defaults to the epazote.yml configuration file, you can specify a different file using the -c flag.
$ epazote -c /path/to/epazote.ymlBasic configuration ​
The configuration file is a YAML file that contains the services you want to monitor, here is an example:
---
services:
app:
url: http://0.0.0.0:8080
every: 1m
expect:
status: 200
if_not:
cmd: systemctl restart appIn this example we are monitoring a service called app that runs on http://0.0.0.0:8080, we are checking the status code every minute and if the status code is not 200 we restart the service using systemctl restart app.
Mental model ​
For each service, Epazote does the same loop:
- Wait
every - Run either an HTTP check with
urlor a shell check withtest - Compare the result with
expect - If the check fails, run
if_notif it is configured
In practice, most services fit one of these three patterns:
1. Check only the HTTP status code ​
services:
app:
url: http://127.0.0.1:8080/health
every: 30s
expect:
status: 2002. Check a JSON API ​
services:
vmagent_targets:
url: http://127.0.0.1:8429/api/v1/targets
every: 30s
expect:
status: 200
json:
status: success3. Check a command exit code ​
services:
nginx_process:
test: pgrep -x nginx
every: 30s
expect:
status: 0If the endpoint returns JSON and you want to match fields instead of raw text, use expect.json:
---
services:
vmagent_targets:
url: http://127.0.0.1:8429/api/v1/targets
every: 30s
expect:
status: 200
json:
status: successIf you want a fallback action, add if_not:
services:
vmagent_targets:
url: http://127.0.0.1:8429/api/v1/targets
every: 30s
expect:
status: 200
json:
status: success
if_not:
threshold: 3
stop: 2
cmd: systemctl restart vmagentThat means:
- wait for 3 consecutive failures before running the command
- after that, run the command at most 2 times
If you use a script in if_not.cmd, Epazote also exports EPAZOTE_* environment variables such as EPAZOTE_SERVICE_NAME, EPAZOTE_ERROR, EPAZOTE_FAILURE_COUNT, and EPAZOTE_THRESHOLD. That is the easiest way to build alert scripts without parsing logs.
run epazote ​
Within the same directory as the epazote.yml file you can run epazote:
$ epazote -v
-vflag is for verbose output
By default, Epazote prints human-readable logs. Use --json-logs if you want structured JSON logs instead.
For HTTP checks in pretty mode:
- healthy checks are logged as compact
INFOentries - failed expectation checks are logged as
WARNentries - response headers are shown only for failed HTTP checks
Metrics ​
After running epazote you can access the metrics at http://0.0.0.0:9080/metrics
you can change the port using the
-pflag
$ curl 0:9080/metricsOutput example:
# HELP epazote_response_time_seconds Service response time in seconds
# TYPE epazote_response_time_seconds histogram
epazote_response_time_seconds_bucket{service_name="app",le="0.005"} 1
epazote_response_time_seconds_bucket{service_name="app",le="0.01"} 1
epazote_response_time_seconds_bucket{service_name="app",le="0.025"} 1
epazote_response_time_seconds_bucket{service_name="app",le="0.05"} 1
epazote_response_time_seconds_bucket{service_name="app",le="0.1"} 1
epazote_response_time_seconds_bucket{service_name="app",le="0.25"} 1
epazote_response_time_seconds_bucket{service_name="app",le="0.5"} 1
epazote_response_time_seconds_bucket{service_name="app",le="1"} 1
epazote_response_time_seconds_bucket{service_name="app",le="2.5"} 1
epazote_response_time_seconds_bucket{service_name="app",le="5"} 1
epazote_response_time_seconds_bucket{service_name="app",le="10"} 1
epazote_response_time_seconds_bucket{service_name="app",le="+Inf"} 1
epazote_response_time_seconds_sum{service_name="app"} 0.000298415
epazote_response_time_seconds_count{service_name="app"} 1
# HELP epazote_status Service status (1 = OK, 0 = FAIL)
# TYPE epazote_status gauge
epazote_status{service_name="app"} 1