mirror of
https://codeberg.org/privacy1st/de-p1st-monitor
synced 2024-11-21 19:33:18 +01:00
269 lines
8.1 KiB
Markdown
269 lines
8.1 KiB
Markdown
# TODOs
|
||
|
||
## ~~digitemp temperature logging~~
|
||
|
||
~~Done through generic sensor_script logger.~~
|
||
|
||
## Public IP address logging
|
||
|
||
Logg the public IP address. Reuse `netcup-dns` Python functions.
|
||
|
||
## ~~Rewrite~~
|
||
|
||
* ~~easier configuration~~
|
||
* ~~easier read/write from/to csv~~
|
||
* ~~use classes & objects~~~~
|
||
|
||
* ~~create plots?~~
|
||
|
||
* ~~Don't send emit warning again, if during previous log a lower warning was emitted~~
|
||
* ~~Example:~~
|
||
* ~~log1: 30°C OK~~
|
||
* ~~log2: 40°C Warning sent~~
|
||
* ~~log3: 35°C Still above limit, but don't send warning again as value decreased~~
|
||
* ~~log4: 37°C Send another warning: The value increased since last logging~~
|
||
|
||
## Use Grafana to visualize metrics
|
||
|
||
One can use Prometheus + Grafana to collect and visualize server metrics.
|
||
|
||
> https://geekflare.com/best-open-source-monitoring-software/
|
||
> This list won’t be complete without including two fantastic open-source solutions – Prometheus and Grafana. Its DIY solution where you use Prometheus to scrape the metrics from server, OS, applications and use Grafana to visualize them.
|
||
|
||
As we do already collect logs, we should do some research on how to
|
||
import data into Grafana.
|
||
|
||
### Time series
|
||
|
||
* https://grafana.com/docs/grafana/latest/fundamentals/timeseries/#introduction-to-time-series
|
||
|
||
E.g. CPU and memory usage, sensor data.
|
||
|
||
* https://grafana.com/docs/grafana/latest/fundamentals/timeseries/#time-series-databases
|
||
|
||
A time series database (TSDB) is a database explicitly designed for time series data.
|
||
|
||
Some supported TSDBs are:
|
||
|
||
* Graphite
|
||
* InfluxDB
|
||
* Prometheus
|
||
|
||
### Installation
|
||
|
||
* https://grafana.com/docs/grafana/latest/setup-grafana/installation/docker/#alpine-image-recommended
|
||
* https://grafana.com/docs/grafana/latest/setup-grafana/installation/docker/#install-official-and-community-grafana-plugins
|
||
|
||
* https://grafana.com/grafana/plugins/marcusolsson-csv-datasource/?tab=installation
|
||
* https://grafana.github.io/grafana-csv-datasource/
|
||
* https://grafana.com/grafana/plugins/marcusolsson-json-datasource/?tab=installation
|
||
* https://grafana.github.io/grafana-json-datasource/
|
||
|
||
```shell
|
||
sudo docker run --rm \
|
||
-p 3000:3000 \
|
||
--name=grafana \
|
||
-e "GF_INSTALL_PLUGINS=marcusolsson-json-datasource,marcusolsson-csv-datasource" \
|
||
grafana/grafana-oss
|
||
```
|
||
|
||
TODO: test csv or json data import tools
|
||
|
||
## Netdata - Can be exported to Grafana
|
||
|
||
* https://github.com/netdata/netdata/blob/master/docs/getting-started/introduction.md
|
||
|
||
## Monit - An existing monitoring service
|
||
|
||
### General notes and links
|
||
|
||
* Monit is a widely used service for system monitoring.
|
||
* OPNsense uses Monit: https://docs.opnsense.org/manual/monit.html
|
||
|
||
* Short slideshow presentation: https://mmonit.com/monit/#slideshow
|
||
* https://wiki.ubuntuusers.de/Monit/
|
||
|
||
* Excellent configuration and usage summary in the Arch Linux Wiki: https://wiki.archlinux.org/title/Monit
|
||
|
||
* Examples
|
||
* https://mmonit.com/wiki/Monit/ConfigurationExamples
|
||
* One can use the returncode or stdout of an executed shell script
|
||
* https://mmonit.com/wiki/Monit/ConfigurationExamples#HDDHealth
|
||
```
|
||
check program HDD_Health with path "/usr/local/etc/monit/scripts/sdahealth.sh"
|
||
every 120 cycles
|
||
if content != "PASSED" then alert
|
||
# if status > 0 then alert
|
||
group health
|
||
```
|
||
* Documentation
|
||
* Event queue - Store events (notifications) if mail server is not reachable
|
||
* https://mmonit.com/monit/documentation/monit.html#Event-queue
|
||
```
|
||
set eventqueue basedir /var/monit
|
||
```
|
||
* https://mmonit.com/monit/documentation/monit.html#SPACE-USAGE-TEST
|
||
```
|
||
check filesystem rootfs with path /
|
||
if space usage > 90% then alert
|
||
```
|
||
* https://mmonit.com/monit/documentation/monit.html#PROGRAM-STATUS-TEST
|
||
```
|
||
check program myscript with path /usr/local/bin/myscript.sh
|
||
if status != 0 then alert
|
||
```
|
||
* https://mmonit.com/monit/documentation/monit.html#PROGRAM-OUTPUT-CONTENT-TEST
|
||
* https://mmonit.com/monit/documentation/monit.html#Link-upload-and-download-bytes
|
||
```
|
||
check network eth0 with interface eth0
|
||
if upload > 500 kB/s then alert
|
||
if total downloaded > 1 GB in last 2 hours then alert
|
||
if total downloaded > 10 GB in last day then alert
|
||
```
|
||
|
||
* https://mmonit.com/monit/documentation/monit.html#MANAGE-YOUR-MONIT-INSTANCES
|
||
|
||
### Monitoring all your monit instances
|
||
|
||
* Monit itself does only monitor the current system
|
||
* Multi-server monitoring is a paid extra service called M/Monit :/
|
||
* But there are other open source services for this
|
||
* https://github.com/monmon-io/monmon#why-did-you-create-monmon
|
||
|
||
### Setup
|
||
|
||
Install and start:
|
||
|
||
```shell
|
||
sudo pacman -S --needed monit lm_sensors smartmontools
|
||
sudo systemctl start monit
|
||
sudo systemctl status monit | grep 'Active: active (running)'
|
||
```
|
||
|
||
Print default configuration:
|
||
|
||
```shell
|
||
sudo cat /etc/monitrc | grep -v '^#'
|
||
#=> set daemon 30
|
||
#=> - A cycle is 30 seconds long.
|
||
#=> set log syslog
|
||
#=> - We will overwrite this config value later on.
|
||
#=> set httpd port 2812
|
||
#=> - Only listen on localhost with username admin and pwd monit.
|
||
```
|
||
|
||
Include `monit.d`:
|
||
|
||
```shell
|
||
sudo mkdir -p /etc/monit.d/
|
||
! sudo cat /etc/monitrc | grep -q '^include' && echo 'include /etc/monit.d/*' | sudo tee -a /etc/monitrc
|
||
```
|
||
|
||
Log to file:
|
||
|
||
```shell
|
||
sudo install -m700 /dev/stdin /etc/monit.d/log <<< 'set log /var/log/monit.log'
|
||
sudo systemctl restart monit
|
||
# tail -f /var/log/monit.log
|
||
```
|
||
|
||
System:
|
||
|
||
```shell
|
||
sudo install -m700 /dev/stdin /etc/monit.d/system <<< 'check system $HOST
|
||
if filedescriptors >= 80% then alert
|
||
if loadavg (5min) > 2 for 4 cycles then alert
|
||
if memory usage > 75% for 4 cycles then alert
|
||
if swap usage > 50% for 4 cycles then alert'
|
||
sudo systemctl restart monit
|
||
```
|
||
|
||
Filesystem:
|
||
|
||
```shell
|
||
sudo install -m700 /dev/stdin /etc/monit.d/fs <<< 'check filesystem rootfs with path /
|
||
if space usage > 80% then alert'
|
||
sudo systemctl restart monit
|
||
```
|
||
|
||
SSL options:
|
||
|
||
* https://mmonit.com/monit/documentation/monit.html#SSL-OPTIONS
|
||
|
||
```shell
|
||
sudo install -m700 /dev/stdin /etc/monit.d/ssl <<< '# Enable certificate verification for all SSL connections
|
||
set ssl options {
|
||
verify: enable
|
||
}'
|
||
sudo systemctl restart monit
|
||
```
|
||
|
||
Mailserver, alerts and eventqueue:
|
||
|
||
* https://mmonit.com/monit/documentation/monit.html#Setting-a-mail-server-for-alert-delivery
|
||
* https://mmonit.com/monit/documentation/monit.html#Setting-an-error-reminder
|
||
* https://mmonit.com/monit/documentation/monit.html#Event-queue
|
||
* If no mail server is available, Monit can queue events in the local file-system for retry until the mail server recovers.
|
||
* By default, the queue is disabled and if the alert handler fails, Monit will simply drop the alert message.
|
||
|
||
```shell
|
||
sudo install -m700 /dev/stdin /etc/monit.d/mail <<< 'set mailserver smtp.mail.de
|
||
port 465
|
||
username "langbein@mail.de"
|
||
password "qiXF6cUgfvSVqd0pAoFTqZEHIcUKzc3n"
|
||
using SSL
|
||
with timeout 20 seconds
|
||
|
||
set mail-format {
|
||
from: langbein@mail.de
|
||
subject: $SERVICE - $EVENT at $DATE
|
||
message: Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION.
|
||
}
|
||
|
||
set alert daniel@systemli.org with reminder on 10 cycles
|
||
|
||
set eventqueue basedir /var/monit'
|
||
sudo systemctl restart monit
|
||
sudo monit -v | grep 'Mail'
|
||
```
|
||
|
||
Test alert:
|
||
|
||
* https://wiki.ubuntuusers.de/Monit/#E-Mail-Benachrichtigungen-testen
|
||
* It is enough to restart monit. It will send an email that it's state has changed (stopped/started).
|
||
* But if desired, one can also create a test for a non-existing file:
|
||
|
||
```shell
|
||
sudo install -m700 /dev/stdin /etc/monit.d/alerttest <<< 'check file alerttest with path /.nonexistent.file'
|
||
sudo systemctl restart monit
|
||
```
|
||
|
||
Example script - run a speedtest:
|
||
|
||
```shell
|
||
sudo pacman -S --needed speedtest-cli
|
||
sudo install -m700 /dev/stdin /etc/monit.d/speedtest <<< 'check program speedtest with path /usr/bin/speedtest-cli
|
||
every 120 cycles
|
||
if status != 0 then alert'
|
||
sudo systemctl restart monit
|
||
```
|
||
|
||
Check config syntax:
|
||
|
||
```shell
|
||
sudo monit -t
|
||
```
|
||
|
||
################## TODOS ##########################
|
||
|
||
* See Firefox bookmark folder 20230219_monit.
|
||
* Disk health
|
||
* BTRFS balance
|
||
* Save disk usage and temperatures to CSV log file
|
||
* e.g. by using `check program check-and-log-temp.sh` monit configuration
|
||
* Or: do checks by monit and every couple minutes run `check program log-system-info.sh`
|
||
|
||
### Monit behind Nginx
|
||
|
||
TODO: Nginx reverse proxy with basic authentication.
|