You are currently viewing Alertmanager 紀錄

Alertmanager 紀錄

下載 alertmanager

https://prometheus.io/download/

解壓縮至特定目錄

下載後,解壓縮至目錄

tar xvfz alertmanager-*.tar.gz
cd alertmanager-*
mv alertmanager-* /usr/local/alertmanager

設定服務啟動檔

systemd 啟動檔範例:

[Unit]
Description=Alertmanager Service
After=network.target rsyslog.target
Wants=network.target

[Service]
Type=simple
User=root
Group=root
ExecStartPre= /usr/local/alertmanager/amtool check-config /usr/local/alertmanager/alertmanager.yml
ExecStart=/usr/local/alertmanager/alertmanager \
    --config.file /usr/local/alertmanager/alertmanager.yml
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true
Restart=on-failure

[Install]
WantedBy=multi-user.target

設定 alertmanager.yml

global:
  slack_api_url: your_webhook_url
route:
  group_by: ['job']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 3h
  receiver: "slack"
  routes:
  - match:
      job: "idc_mail"
    group_by: ['host']
    routes:
    - match:
        severity: "critical"
      receiver: "slack"
    - match:
        severity: "warning"
      receiver: "email"
  - match:
      job: "proxmox"
    group_by: ['instance']
    group_wait: 10s
    routes:
    - match:
        severity: "critical"
      receiver: "slack"
    - match:
        severity: "warning"
      receiver: "email"
  - match:
      job: "node_gce"
    group_by: [ 'zone']
    group_wait: 10s
    routes:
    - match:
        severity: "critical"
      receiver: "slack"
    - match:
        severity: "warning"
      receiver: "email"
  - match:
      job: "Domain"
    group_by: ['domain']
    receiver: "email"

receivers:
- name: slack
  slack_configs:
  - api_url: 'your_webhook_url'
    username: "TigerFly Project's Alert"
    channel: 'cts_alert'
    icon_url: https://avatars3.githubusercontent.com/u/3380462
    send_resolved: true
    title: |-
      [TigerFly Project][{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} for {{ .CommonLabels.job }}
      {{- if gt (len .CommonLabels) (len .GroupLabels) -}}
        {{" "}}(
        {{- with .CommonLabels.Remove .GroupLabels.Names }}
          {{- range $index, $label := .SortedPairs -}}
            {{ if $index }}, {{ end }}
            {{- $label.Name }}="{{ $label.Value -}}"
          {{- end }}
        {{- end -}}
        )
      {{- end }}
    text: >-
      {{ with index .Alerts 0 -}}
        :chart_with_upwards_trend: *<{{ .GeneratorURL }}|Graph>*
        {{- if .Annotations.runbook }}   :notebook: *<{{ .Annotations.runbook }}|Runbook>*{{ end }}
      {{ end }}

      *Alert details*:

      {{ range .Alerts -}}
        *Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - { .Labels.severity }{{ end }}
      *Summary:* {{ .Annotations.summary }}
      *Description:* {{ .Annotations.description }}
      *Details:*
        {{ range .Labels.SortedPairs }} • *{{ .Name }}:* { .Value }
        {{ end }}
      {{ end }}

- name: 'email'
  email_configs:
  - to: [email protected]
    from: '[email protected]'
    smarthost: smtp.gmail.com:587
    auth_username: '[email protected]'
    auth_password: 'your_accout_password'
    headers:
      From: "TigerFly Prometheus"
      Subject: "TigerFly Monitor Alert"

另外 template 可以參考官方:https://github.com/prometheus/alertmanager/tree/master/template

實際告警發送情形:

alert

使用 amtool 控制 silence

# silence example, default expire time 1h, can use --expires & --expire-on to define longer time.
# will return a silence id 
amtool --alermanager.url=http://localhost:9003 silence add alertname=InstancesGone service=application1

# use silence id to expire silence
amtool --alermanager.url=http://localhost:9003 silence expire $silence-id

Beck Yeh

熱愛學習於 Linux 與 程式設計 在網站中分享各式各樣學習到的新知識

發佈留言

這個網站採用 Akismet 服務減少垃圾留言。進一步了解 Akismet 如何處理網站訪客的留言資料