Table of Contents
下載 alertmanager
https://prometheus.io/download/
解壓縮至特定目錄
下載後,解壓縮至目錄
tar xvfz alertmanager-*.tar.gz
cd alertmanager-*
mv alertmanager-* /usr/local/alertmanager
設定服務啟動檔
systemd
啟動檔範例:
[Unit]
Description=Alertmanager Service
After=network.target rsyslog.target
Wants=network.target
[Service]
Type=simple
User=root
Group=root
ExecStartPre= /usr/local/alertmanager/amtool check-config /usr/local/alertmanager/alertmanager.yml
ExecStart=/usr/local/alertmanager/alertmanager \
--config.file /usr/local/alertmanager/alertmanager.yml
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true
Restart=on-failure
[Install]
WantedBy=multi-user.target
設定 alertmanager.yml
global:
slack_api_url: your_webhook_url
route:
group_by: ['job']
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: "slack"
routes:
- match:
job: "idc_mail"
group_by: ['host']
routes:
- match:
severity: "critical"
receiver: "slack"
- match:
severity: "warning"
receiver: "email"
- match:
job: "proxmox"
group_by: ['instance']
group_wait: 10s
routes:
- match:
severity: "critical"
receiver: "slack"
- match:
severity: "warning"
receiver: "email"
- match:
job: "node_gce"
group_by: [ 'zone']
group_wait: 10s
routes:
- match:
severity: "critical"
receiver: "slack"
- match:
severity: "warning"
receiver: "email"
- match:
job: "Domain"
group_by: ['domain']
receiver: "email"
receivers:
- name: slack
slack_configs:
- api_url: 'your_webhook_url'
username: "TigerFly Project's Alert"
channel: 'cts_alert'
icon_url: https://avatars3.githubusercontent.com/u/3380462
send_resolved: true
title: |-
[TigerFly Project][{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} for {{ .CommonLabels.job }}
{{- if gt (len .CommonLabels) (len .GroupLabels) -}}
{{" "}}(
{{- with .CommonLabels.Remove .GroupLabels.Names }}
{{- range $index, $label := .SortedPairs -}}
{{ if $index }}, {{ end }}
{{- $label.Name }}="{{ $label.Value -}}"
{{- end }}
{{- end -}}
)
{{- end }}
text: >-
{{ with index .Alerts 0 -}}
:chart_with_upwards_trend: *<{{ .GeneratorURL }}|Graph>*
{{- if .Annotations.runbook }} :notebook: *<{{ .Annotations.runbook }}|Runbook>*{{ end }}
{{ end }}
*Alert details*:
{{ range .Alerts -}}
*Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - { .Labels.severity }
{{ end }}
*Summary:* {{ .Annotations.summary }}
*Description:* {{ .Annotations.description }}
*Details:*
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* { .Value }
{{ end }}
{{ end }}
- name: 'email'
email_configs:
- to: [email protected]
from: '[email protected]'
smarthost: smtp.gmail.com:587
auth_username: '[email protected]'
auth_password: 'your_accout_password'
headers:
From: "TigerFly Prometheus"
Subject: "TigerFly Monitor Alert"
另外 template 可以參考官方:https://github.com/prometheus/alertmanager/tree/master/template
實際告警發送情形:
使用 amtool 控制 silence
# silence example, default expire time 1h, can use --expires & --expire-on to define longer time.
# will return a silence id
amtool --alermanager.url=http://localhost:9003 silence add alertname=InstancesGone service=application1
# use silence id to expire silence
amtool --alermanager.url=http://localhost:9003 silence expire $silence-id