how to monitor weave-net via kube-prometheus

Thanaphat Nuangjumnong
3 min readDec 29, 2020

## วิธี Scrap Metrics

## weave-service.yaml

apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: weave-net
name: weave-net
spec:
clusterIP: None
ports:
- name: weave-metrics
port: 6782
targetPort: 6782
selector:
name: weave-net

## weave-servicemonitor.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: weave-net
labels:
k8s-app: weave-net
namespace: monitoring
spec:
jobLabel: k8s-app
selector:
matchLabels:
k8s-app: weave-net
namespaceSelector:
matchNames:
- kube-system
endpoints:
- port: weave-metrics
path: /metrics
interval: 15s

## weave-rules-.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule metadata:
labels:
prometheus: k8s
role: alert-rules
name: weave-net
namespace: monitoring
spec:
groups:
- name: weave-net
rules:
- alert: WeaveNet Check Type
annotations:
message: 'WeaveNet (Template:$labels.pod}) on instance Template:$labels.instance has Type == Template:$labels.type'
expr: weave_connections{type!="fastdp"}
for: 10m
labels:
severity: critical
- alert: WeaveNet IPAM Unreachable
annotations:
message: 'Actionable: Find why the unreachability threshold have increased from threshold and fix it. WeaveNet is responsible to keep it under control. Weave rm peer deployment can help clean things.'
summary: WeaveNetIPAM unreachability percentage is above threshold. Go to the below prometheus link for details.
expr: weave_ipam_unreachable_percentage > 25
for: 10m
labels:
severity: critical
- alert: WeaveNet IPAM SPlit Brain
annotations:
message: 'Actionable: Every node should see same unreachability percentage. Please check and fix why it is not so.'
summary: WeaveNetIPAM has a split brain. Go to the below prometheus link for details.
expr: max(weave_ipam_unreachable_percentage) - min(weave_ipam_unreachable_percentage) > 0
for: 3m
labels:
severity: critical
- alert: WeaveNetIPAMPendingAllocates
annotations:
message: 'Actionable: Find the reason for IPAM allocates to be in pending state and fix it.'
summary: WeaveNet IPAM has pending allocates. Go to the below prometheus link for details.
expr: sum(weave_ipam_pending_allocates) > 0
for: 3m
labels:
severity: critical
- alert: WeaveNetIPAMPendingClaims
annotations:
message: 'Actionable: Find the reason for IPAM claims to be in pending state and fix it.'
summary: WeaveNet IPAM has pending claims. Go to the below prometheus link for details.
expr: sum(weave_ipam_pending_claims) > 0
for: 3m
labels:
severity: critical
- alert: WeaveNetFastDPFlowsLow
annotations:
message: 'Actionable: Find the reason for fast dp flows dropping below the threshold.'
summary: WeaveNet total FastDP flows is below threshold. Go to the below prometheus link for details.
expr: sum(weave_flows) < 20000
for: 3m
labels:
severity: critical
- alert: WeaveNetFastDPFlowsOff
annotations:
message: 'Actionable: Find the reason for fast dp being off.'
summary: WeaveNet FastDP flows is not happening in some or all nodes. Go to the below prometheus link for details.
expr: sum(weave_flows == bool 0) > 0
for: 3m
labels:
severity: critical
- alert: WeaveNetHighConnectionTerminationRate
annotations:
message: 'Actionable: Find the reason for high connection termination rate and fix it.'
summary: A lot of connections are getting terminated. Go to the below prometheus link for details.
expr: rate(weave_connection_terminations_total[5m]) > 0.1
for: 5m
labels:
severity: critical
- alert: WeaveNetConnectionsConnecting
annotations:
message: 'A lot of connections are in connecting state'
summary: A lot of connections are in connecting state. Go to the below prometheus link for details.
expr: sum(weave_connections{state='connecting'}) > 0
for: 3m
labels:
severity: critical
- alert: WeaveNetConnectionsRetying
annotations:
message: 'A lot of connections are in retrying state.'
summary: A lot of connections are in retrying state. Go to the below prometheus link for details.
expr: sum(weave_connections{state='retrying'}) > 0
for: 3m
labels:
severity: critical
- alert: WeaveNetConnectionsPending
annotations:
message: 'A lot of connections are in pending state.'
summary: A lot of connections are in pending state. Go to the below prometheus link for details.
expr: sum(weave_connections{state='pending'}) > 0
for: 3m
labels:
severity: critical
- alert: WeaveNetConnectionsFailed
annotations:
message: 'A lot of connections are in failed state.'
summary: A lot of connections are in failed state. Go to the below prometheus link for details.
expr: sum(weave_connections{state='failed'}) > 0
for: 3m
labels:
severity: critical

kubectl apply -f weave-service.yaml -n kube-system

kubectl apply -f weave-servicemonitor.yaml -n monitoring

kubectl apply -f weave-rules.yaml -n monitoring

## verify

http://prometheus.yourdomain/rules

http://grafana.yourdomain/

Ref.

https://stackoverflow.com/questions/60232516/on-what-metric-weave-net-should-be-alerted-on

https://github.com/prometheus-operator/kube-prometheus/blob/master/docs/weave-net-support.md

https://grafana.com/grafana/dashboards/11804

--

--