de-p1st-monitor/TODO.md

8.0 KiB
Raw Blame History

TODOs

Public IP address

Logg the public IP address. Reuse netcup-dns Python functions.

Rewrite

  • easier configuration

  • easier read/write from/to csv

  • use classes & objects~~

  • create plots?

  • Don't send emit warning again, if during previous log a lower warning was emitted

    • Example:
      • log1: 30°C OK
      • log2: 40°C Warning sent
      • log3: 35°C Still above limit, but don't send warning again as value decreased
      • log4: 37°C Send another warning: The value increased since last logging

Use Grafana to visualize metrics

One can use Prometheus + Grafana to collect and visualize server metrics.

https://geekflare.com/best-open-source-monitoring-software/ This list wont be complete without including two fantastic open-source solutions Prometheus and Grafana. Its DIY solution where you use Prometheus to scrape the metrics from server, OS, applications and use Grafana to visualize them.

As we do already collect logs, we should do some research on how to import data into Grafana.

Time series

E.g. CPU and memory usage, sensor data.

A time series database (TSDB) is a database explicitly designed for time series data.

Some supported TSDBs are:

  • Graphite
  • InfluxDB
  • Prometheus

Installation

sudo docker run --rm \
  -p 3000:3000 \
  --name=grafana \
  -e "GF_INSTALL_PLUGINS=marcusolsson-json-datasource,marcusolsson-csv-datasource" \
  grafana/grafana-oss

TODO: test csv or json data import tools

Netdata - Can be exported to Grafana

Monit - An existing monitoring service

Monitoring all your monit instances

Setup

Install and start:

sudo pacman -S --needed monit lm_sensors smartmontools
sudo systemctl start monit
sudo systemctl status monit | grep 'Active: active (running)'

Print default configuration:

sudo cat /etc/monitrc | grep -v '^#'
#=> set daemon 30
#=>   - A cycle is 30 seconds long.
#=> set log syslog
#=>   - We will overwrite this config value later on.
#=> set httpd port 2812
#=>   - Only listen on localhost with username admin and pwd monit.

Include monit.d:

sudo mkdir -p /etc/monit.d/
! sudo cat /etc/monitrc | grep -q '^include' && echo 'include /etc/monit.d/*' | sudo tee -a /etc/monitrc

Log to file:

sudo install -m700 /dev/stdin /etc/monit.d/log <<< 'set log /var/log/monit.log'
sudo systemctl restart monit
# tail -f /var/log/monit.log

System:

sudo install -m700 /dev/stdin /etc/monit.d/system <<< 'check system $HOST
  if filedescriptors >= 80% then alert
  if loadavg (5min) > 2 for 4 cycles then alert
  if memory usage > 75% for 4 cycles then alert
  if swap usage > 50% for 4 cycles then alert'
sudo systemctl restart monit

Filesystem:

sudo install -m700 /dev/stdin /etc/monit.d/fs <<< 'check filesystem rootfs with path /
  if space usage > 80% then alert'
sudo systemctl restart monit

SSL options:

sudo install -m700 /dev/stdin /etc/monit.d/ssl <<< '# Enable certificate verification for all SSL connections
set ssl options {
  verify: enable
}'
sudo systemctl restart monit

Mailserver, alerts and eventqueue:

sudo install -m700 /dev/stdin /etc/monit.d/mail <<< 'set mailserver smtp.mail.de
  port 465
  username "langbein@mail.de"
  password "qiXF6cUgfvSVqd0pAoFTqZEHIcUKzc3n"
  using SSL
  with timeout 20 seconds

set mail-format {
      from: langbein@mail.de
   subject: $SERVICE - $EVENT at $DATE
   message: Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION.
} 

set alert daniel@systemli.org with reminder on 10 cycles

set eventqueue basedir /var/monit'
sudo systemctl restart monit
sudo monit -v  | grep 'Mail'

Test alert:

sudo install -m700 /dev/stdin /etc/monit.d/alerttest <<< 'check file alerttest with path /.nonexistent.file'
sudo systemctl restart monit

Example script - run a speedtest:

sudo pacman -S --needed speedtest-cli
sudo install -m700 /dev/stdin /etc/monit.d/speedtest <<< 'check program speedtest with path /usr/bin/speedtest-cli
  every 120 cycles
  if status != 0 then alert'
sudo systemctl restart monit

Check config syntax:

sudo monit -t

################## TODOS ##########################

  • See Firefox bookmark folder 20230219_monit.
  • Disk health
  • BTRFS balance
  • Save disk usage and temperatures to CSV log file
    • e.g. by using check program check-and-log-temp.sh monit configuration
    • Or: do checks by monit and every couple minutes run check program log-system-info.sh

Monit behind Nginx

TODO: Nginx reverse proxy with basic authentication.