About ​
The problem to solve ​
In today’s fast-paced, microservices-driven world, ensuring the reliability and uptime of distributed applications is a growing challenge for DevOps teams and developers. Microservices architectures, while powerful and scalable, introduce complexity: each service operates independently, often relying on HTTP endpoints to communicate, and failures in one service can cascade across the system. Monitoring these services manually or with traditional tools is inefficient and error-prone, leading to the following critical problems:
Service Downtime Goes Unnoticed: Without constant monitoring, HTTP endpoints can fail silently—returning unexpected status codes, expired SSL certificates, or degraded performance, leaving teams unaware until users report issues.
Manual Recovery is Slow and Reactive: When a service fails, teams must manually identify the issue, log in to servers, and execute recovery commands (e.g., restarts or signals), wasting valuable time and increasing downtime.
Configuration Overload: Existing monitoring tools often require complex setups, custom scripting, or learning new syntax, adding overhead to already burdened teams who need simplicity and integration with tools like Ansible or SaltStack.
Dynamic Environments are Hard to Track: In CI/CD pipelines, services and their endpoints change frequently. Static monitoring solutions struggle to adapt without restarts or manual updates, disrupting deployment workflows.
Resource Inefficiency: Many tools overanalyze responses (e.g., reading full response bodies) or lack fine-tuned control, consuming unnecessary resources when only lightweight checks are needed.
These challenges result in delayed responses to failures, increased operational costs, and frustrated users. Epazote.io was created to address this gap: to provide an automated, lightweight, and adaptive solution that keeps microservices healthy with minimal effort, ensuring they stay up and running efficiently in any environment.
How it works ​
epazote
is an automated HTTP microservices supervisor that keeps services running by checking their endpoints and taking recovery actions if needed.
It works by sending HTTP requests to service URLs at set intervals, checking responses for status codes, headers, or body content, and executing commands like restarts if something goes wrong.
You configure it using a YAML file called epazote.yml
, where you define services, expected responses, and recovery actions.
It can run in two modes—on the same server to fix issues directly or separately just to monitor and alert, which is flexible for different setups.
It can also be used to check the output of a command or script TEST
, making it versatile for various use cases.
It will listen by default on port 9080
and provide and /metrics
endpoint for Prometheus to scrape.