Monitoring Setup
Set up monitoring for VMProber using Prometheus and Grafana.
Prerequisites
- Prometheus installed and running
- Grafana installed and running (optional)
- VMProber running and accessible
Prometheus Configuration
1. Add VMProber to Prometheus
Edit prometheus.yml:
scrape_configs:
- job_name: 'vmprober'
scrape_interval: 30s
scrape_timeout: 10s
static_configs:
- targets: ['vmprober:8429']
metrics_path: '/metrics'
2. Reload Prometheus
# If using systemd
sudo systemctl reload prometheus
# Or send SIGHUP
kill -HUP <prometheus-pid>
3. Verify Scraping
Check Prometheus targets page: http://prometheus:9090/targets
Grafana Dashboard
1. Create Dashboard
- Go to Grafana:
http://grafana:3000 - Create new dashboard
- Add panels for key metrics
2. Key Panels
Success Rate
rate(vmprober_probe_success_total[5m]) / rate(vmprober_probe_attempts_total[5m])
Average RTT
rate(vmprober_probe_rtt_seconds_sum[5m]) / rate(vmprober_probe_rtt_seconds_count[5m])
95th Percentile RTT
histogram_quantile(0.95, rate(vmprober_probe_rtt_seconds_bucket[5m]))
Failure Rate by Target
rate(vmprober_probe_failure_total[5m]) / rate(vmprober_probe_attempts_total[5m])
Alerts
High Failure Rate
- alert: HighProbeFailureRate
expr: |
rate(vmprober_probe_failure_total[5m]) /
rate(vmprober_probe_attempts_total[5m]) > 0.1
for: 5m
annotations:
summary: "High probe failure rate for {{ $labels.target }}"
High Latency
- alert: HighProbeLatency
expr: |
histogram_quantile(0.95,
rate(vmprober_probe_rtt_seconds_bucket[5m])
) > 1
for: 5m
annotations:
summary: "High probe latency for {{ $labels.target }}"
Service Discovery
For Kubernetes, use ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: vmprober
spec:
selector:
matchLabels:
app: vmprober
endpoints:
- port: http
path: /metrics
interval: 30s
Next Steps
- Operations: Monitoring - Advanced monitoring
- Reference: Metrics - All available metrics