At LMAX Exchange Nagios is one of our essential tools for monitoring and verifying the operation of our systems. We use it for three distinct purposes.
- Alerting when things break.
- Recording trends so that we can predict when problems will occur and then mitigate them.
- Using Nagios to verify the overall structure of our environments.
Things have broken
Using Nagios to monitor things breaking down is perhaps the most common use case. These checks need to run often, perhaps every few seconds. Let us look at an example, a web server, and some of the tests we might [...]