7.7 KiB
TODOs
HDD power status
Log if HDD is active/idle or spun-down.
Public IP address logging
Logg the public IP address. Reuse netcup-dns
Python functions.
Use Grafana to visualize metrics
One can use Prometheus + Grafana to collect and visualize server metrics.
https://geekflare.com/best-open-source-monitoring-software/ This list won’t be complete without including two fantastic open-source solutions – Prometheus and Grafana. Its DIY solution where you use Prometheus to scrape the metrics from server, OS, applications and use Grafana to visualize them.
As we do already collect logs, we should do some research on how to import data into Grafana.
Time series
E.g. CPU and memory usage, sensor data.
A time series database (TSDB) is a database explicitly designed for time series data.
Some supported TSDBs are:
- Graphite
- InfluxDB
- Prometheus
Installation
-
https://grafana.com/docs/grafana/latest/setup-grafana/installation/docker/#alpine-image-recommended
-
https://grafana.com/grafana/plugins/marcusolsson-csv-datasource/?tab=installation
-
https://grafana.com/grafana/plugins/marcusolsson-json-datasource/?tab=installation
sudo docker run --rm \
-p 3000:3000 \
--name=grafana \
-e "GF_INSTALL_PLUGINS=marcusolsson-json-datasource,marcusolsson-csv-datasource" \
grafana/grafana-oss
TODO: test csv or json data import tools
Netdata - Can be exported to Grafana
Monit - An existing monitoring service
General notes and links
-
Monit is a widely used service for system monitoring.
- OPNsense uses Monit: https://docs.opnsense.org/manual/monit.html
-
Short slideshow presentation: https://mmonit.com/monit/#slideshow
-
Excellent configuration and usage summary in the Arch Linux Wiki: https://wiki.archlinux.org/title/Monit
-
Examples
- https://mmonit.com/wiki/Monit/ConfigurationExamples
- One can use the returncode or stdout of an executed shell script
- https://mmonit.com/wiki/Monit/ConfigurationExamples#HDDHealth
check program HDD_Health with path "/usr/local/etc/monit/scripts/sdahealth.sh" every 120 cycles if content != "PASSED" then alert # if status > 0 then alert group health
- https://mmonit.com/wiki/Monit/ConfigurationExamples
-
Documentation
- Event queue - Store events (notifications) if mail server is not reachable
set eventqueue basedir /var/monit
- https://mmonit.com/monit/documentation/monit.html#SPACE-USAGE-TEST
check filesystem rootfs with path / if space usage > 90% then alert
check program myscript with path /usr/local/bin/myscript.sh if status != 0 then alert
- https://mmonit.com/monit/documentation/monit.html#PROGRAM-OUTPUT-CONTENT-TEST
- https://mmonit.com/monit/documentation/monit.html#Link-upload-and-download-bytes
check network eth0 with interface eth0 if upload > 500 kB/s then alert if total downloaded > 1 GB in last 2 hours then alert if total downloaded > 10 GB in last day then alert
- Event queue - Store events (notifications) if mail server is not reachable
-
https://mmonit.com/monit/documentation/monit.html#MANAGE-YOUR-MONIT-INSTANCES
Monitoring all your monit instances
- Monit itself does only monitor the current system
- Multi-server monitoring is a paid extra service called M/Monit :/
- But there are other open source services for this
Setup
Install and start:
sudo pacman -S --needed monit lm_sensors smartmontools
sudo systemctl start monit
sudo systemctl status monit | grep 'Active: active (running)'
Print default configuration:
sudo cat /etc/monitrc | grep -v '^#'
#=> set daemon 30
#=> - A cycle is 30 seconds long.
#=> set log syslog
#=> - We will overwrite this config value later on.
#=> set httpd port 2812
#=> - Only listen on localhost with username admin and pwd monit.
Include monit.d
:
sudo mkdir -p /etc/monit.d/
! sudo cat /etc/monitrc | grep -q '^include' && echo 'include /etc/monit.d/*' | sudo tee -a /etc/monitrc
Log to file:
sudo install -m700 /dev/stdin /etc/monit.d/log <<< 'set log /var/log/monit.log'
sudo systemctl restart monit
# tail -f /var/log/monit.log
System:
sudo install -m700 /dev/stdin /etc/monit.d/system <<< 'check system $HOST
if filedescriptors >= 80% then alert
if loadavg (5min) > 2 for 4 cycles then alert
if memory usage > 75% for 4 cycles then alert
if swap usage > 50% for 4 cycles then alert'
sudo systemctl restart monit
Filesystem:
sudo install -m700 /dev/stdin /etc/monit.d/fs <<< 'check filesystem rootfs with path /
if space usage > 80% then alert'
sudo systemctl restart monit
SSL options:
sudo install -m700 /dev/stdin /etc/monit.d/ssl <<< '# Enable certificate verification for all SSL connections
set ssl options {
verify: enable
}'
sudo systemctl restart monit
Mailserver, alerts and eventqueue:
- https://mmonit.com/monit/documentation/monit.html#Setting-a-mail-server-for-alert-delivery
- https://mmonit.com/monit/documentation/monit.html#Setting-an-error-reminder
- https://mmonit.com/monit/documentation/monit.html#Event-queue
- If no mail server is available, Monit can queue events in the local file-system for retry until the mail server recovers.
- By default, the queue is disabled and if the alert handler fails, Monit will simply drop the alert message.
sudo install -m700 /dev/stdin /etc/monit.d/mail <<< 'set mailserver smtp.mail.de
port 465
username "langbein@mail.de"
password "qiXF6cUgfvSVqd0pAoFTqZEHIcUKzc3n"
using SSL
with timeout 20 seconds
set mail-format {
from: langbein@mail.de
subject: $SERVICE - $EVENT at $DATE
message: Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION.
}
set alert daniel@systemli.org with reminder on 10 cycles
set eventqueue basedir /var/monit'
sudo systemctl restart monit
sudo monit -v | grep 'Mail'
Test alert:
- https://wiki.ubuntuusers.de/Monit/#E-Mail-Benachrichtigungen-testen
- It is enough to restart monit. It will send an email that it's state has changed (stopped/started).
- But if desired, one can also create a test for a non-existing file:
sudo install -m700 /dev/stdin /etc/monit.d/alerttest <<< 'check file alerttest with path /.nonexistent.file'
sudo systemctl restart monit
Example script - run a speedtest:
sudo pacman -S --needed speedtest-cli
sudo install -m700 /dev/stdin /etc/monit.d/speedtest <<< 'check program speedtest with path /usr/bin/speedtest-cli
every 120 cycles
if status != 0 then alert'
sudo systemctl restart monit
Check config syntax:
sudo monit -t
################## TODOS ##########################
- See Firefox bookmark folder 20230219_monit.
- Disk health
- BTRFS balance
- Save disk usage and temperatures to CSV log file
- e.g. by using
check program check-and-log-temp.sh
monit configuration - Or: do checks by monit and every couple minutes run
check program log-system-info.sh
- e.g. by using
Monit behind Nginx
TODO: Nginx reverse proxy with basic authentication.