how to monitor weave-net via kube-prometheus

3 min readDec 29, 2020

## วิธี Scrap Metrics

## weave-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: weave-net
  name: weave-net
spec:
  clusterIP: None
  ports:
  - name: weave-metrics
    port: 6782
    targetPort: 6782
  selector:
    name: weave-net

## weave-servicemonitor.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: weave-net
  labels:
    k8s-app: weave-net
  namespace: monitoring
spec:
  jobLabel: k8s-app
  selector:
    matchLabels:
      k8s-app: weave-net
  namespaceSelector:
    matchNames:
    - kube-system
  endpoints:
  - port: weave-metrics
    path: /metrics
    interval: 15s

## weave-rules-.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule metadata:
  labels:
   prometheus: k8s
   role: alert-rules
 name: weave-net
 namespace: monitoring
spec:
  groups:
 - name: weave-net
   rules:
   - alert: WeaveNet Check Type
     annotations:
       message: 'WeaveNet (Template:$labels.pod}) on instance Template:$labels.instance has Type == Template:$labels.type'
     expr: weave_connections{type!="fastdp"}
     for: 10m
     labels:
       severity: critical
   - alert: WeaveNet IPAM Unreachable
     annotations:
       message: 'Actionable: Find why the unreachability threshold have increased from threshold and fix it. WeaveNet is responsible to keep it under control. Weave rm peer deployment can help clean things.'
       summary: WeaveNetIPAM unreachability percentage is above threshold. Go to the below prometheus link for details.
     expr: weave_ipam_unreachable_percentage > 25
     for: 10m
     labels:
       severity: critical
   - alert: WeaveNet IPAM SPlit Brain
     annotations:
       message: 'Actionable: Every node should see same unreachability percentage. Please check and fix why it is not so.'
       summary: WeaveNetIPAM has a split brain. Go to the below prometheus link for details.
     expr: max(weave_ipam_unreachable_percentage) - min(weave_ipam_unreachable_percentage) > 0
     for: 3m
     labels:
       severity: critical
   - alert: WeaveNetIPAMPendingAllocates
     annotations:
       message: 'Actionable: Find the reason for IPAM allocates to be in pending state and fix it.'
       summary: WeaveNet IPAM has pending allocates. Go to the below prometheus link for details.
     expr: sum(weave_ipam_pending_allocates) > 0
     for: 3m
     labels:
       severity: critical
   - alert: WeaveNetIPAMPendingClaims
     annotations:
       message: 'Actionable: Find the reason for IPAM claims to be in pending state and fix it.'
       summary: WeaveNet IPAM has pending claims. Go to the below prometheus link for details.
     expr: sum(weave_ipam_pending_claims) > 0
     for: 3m
     labels:
       severity: critical
   - alert: WeaveNetFastDPFlowsLow
     annotations:
       message: 'Actionable: Find the reason for fast dp flows dropping below the threshold.'
       summary: WeaveNet total FastDP flows is below threshold. Go to the below prometheus link for details.
     expr: sum(weave_flows) < 20000
     for: 3m
     labels:
       severity: critical
   - alert: WeaveNetFastDPFlowsOff
     annotations:
       message: 'Actionable: Find the reason for fast dp being off.'
       summary: WeaveNet FastDP flows is not happening in some or all nodes. Go to the below prometheus link for details.
     expr: sum(weave_flows == bool 0) > 0
     for: 3m
     labels:
       severity: critical
   - alert: WeaveNetHighConnectionTerminationRate
     annotations:
       message: 'Actionable: Find the reason for high connection termination rate and fix it.'
       summary: A lot of connections are getting terminated. Go to the below prometheus link for details.
     expr: rate(weave_connection_terminations_total[5m]) > 0.1
     for: 5m
     labels:
       severity: critical
   - alert: WeaveNetConnectionsConnecting
     annotations:
       message: 'A lot of connections are in connecting state'
       summary: A lot of connections are in connecting state. Go to the below prometheus link for details.
     expr: sum(weave_connections{state='connecting'}) > 0
     for: 3m
     labels:
       severity: critical
   - alert: WeaveNetConnectionsRetying
     annotations:
       message: 'A lot of connections are in retrying state.'
       summary: A lot of connections are in retrying state. Go to the below prometheus link for details.
     expr: sum(weave_connections{state='retrying'}) > 0
     for: 3m
     labels:
       severity: critical
   - alert: WeaveNetConnectionsPending
     annotations:
       message: 'A lot of connections are in pending state.'
       summary: A lot of connections are in pending state. Go to the below prometheus link for details.
     expr: sum(weave_connections{state='pending'}) > 0
     for: 3m
     labels:
       severity: critical
   - alert: WeaveNetConnectionsFailed
     annotations:
       message: 'A lot of connections are in failed state.'
       summary: A lot of connections are in failed state. Go to the below prometheus link for details.
     expr: sum(weave_connections{state='failed'}) > 0
     for: 3m
     labels:
       severity: critical

kubectl apply -f weave-service.yaml -n kube-system

kubectl apply -f weave-servicemonitor.yaml -n monitoring

kubectl apply -f weave-rules.yaml -n monitoring

## verify

http://prometheus.yourdomain/rules

http://grafana.yourdomain /

Ref.

https://stackoverflow.com/questions/60232516/on-what-metric-weave-net-should-be-alerted-on

https://github.com/prometheus-operator/kube-prometheus/blob/master/docs/weave-net-support.md

https://grafana.com/grafana/dashboards/11804

how to monitor weave-net via kube-prometheus

Written by Thanaphat Nuangjumnong