System Overview
VMProber is a high-performance, reliable network monitoring tool designed for production environments.
High-Level Architecture
flowchart TB
subgraph ConfigLayer["Configuration Layer"]
YAML["YAML Config"]
Files["Files Source"]
HTTPSource["HTTP Source"]
Commands["Commands Source"]
end
subgraph CoreEngine["Core Engine"]
Scheduler["Scheduler"]
WorkerPool["Worker Pool"]
ProbeEngine["Probe Engine"]
Scheduler --> WorkerPool
Scheduler --> ProbeEngine
WorkerPool --> ProbeEngine
end
ConfigLayer --> CoreEngine
subgraph ProbeTypes["Probe Types"]
TCP["TCP Probe"]
UDP["UDP Probe"]
ICMP["ICMP Probe"]
end
ProbeEngine --> TCP
ProbeEngine --> UDP
ProbeEngine --> ICMP
TCP --> Normalizer["Result Normalizer"]
UDP --> Normalizer
ICMP --> Normalizer
Normalizer --> MetricsSystem["Metrics System"]
Normalizer --> WAL["WAL System"]
MetricsSystem --> HTTPServer["HTTP Server (Pull)"]
WAL --> VMAdapter["VM Adapter (Push)"]
Core Components
1. Configuration Layer
- Loads configuration from multiple sources
- Supports hot reload
- Validates configuration on load
2. Scheduler
- Manages probe execution schedule
- Implements jitter for load distribution
- Handles rate limiting per host and globally
3. Worker Pool
- Executes probes concurrently
- Limits concurrency to prevent resource exhaustion
- Manages worker lifecycle
4. Probe Engine
- Executes different probe types (TCP, UDP, ICMP)
- Handles timeouts and retries
- Collects detailed probe results
5. Result Normalizer
- Normalizes probe results to unified format
- Performs deduplication
- Enriches with metadata
6. Metrics System
- Collects Prometheus metrics
- Exposes via HTTP endpoint
- Supports custom labels and buckets
7. WAL System
- Write-Ahead Log for reliability
- Ensures no data loss on crashes
- Supports compression and rotation
8. Export Layer
- Pull Mode: HTTP server for Prometheus scraping
- Push Mode: VictoriaMetrics adapter with retry logic
Data Flow
- Configuration Loading: Config loaded from various sources
- Target Discovery: Targets discovered and scheduled
- Probe Execution: Workers execute probes based on schedule
- Result Processing: Results normalized and deduplicated
- Storage: Results written to WAL
- Metrics Update: Metrics updated in real-time
- Export: Metrics exported via pull or push mode
Key Design Principles
Reliability
- WAL ensures no data loss
- Retry logic with exponential backoff
- Graceful shutdown with proper cleanup
Performance
- Efficient scheduling with jitter
- Rate limiting to prevent overload
- Batch processing for exports
Observability
- Comprehensive metrics
- Structured logging
- Health and readiness checks
Extensibility
- Pluggable probe types
- Interface-based design
- Hot reload support
Next Steps
- Design Principles - Architectural decisions
- Component Architecture - Detailed component design
- Data Flow - How data moves through the system