Health & Monitoring
Monitor system health, check service availability, and diagnose issues with comprehensive health check endpoints. The detailed health check interfaces with MainOrchestrator.health_check() to verify all pipelines and services. Essential for production deployments and automated monitoring.
GET
/api/v1/health
Comprehensive health check with detailed service information. Routes through MainOrchestrator.health_check() to verify all pipelines and services.
GET /health - Root Health Check
Basic health check for overall system status. No authentication required.
Request
Response
Response Fields
| Field | Type | Description |
|---|---|---|
status | string | Overall system status: "ok", "degraded", or "down" |
system | string | System identifier |
version | string | API version number |
uptime_seconds | integer | System uptime in seconds |
pipelines | object | Status of each processing pipeline |
services | object | Connection status for backend services |
Agent Interaction
Query Parameters
| Parameter | Type | Description |
|---|---|---|
detailed | boolean | Include detailed service diagnostics (default: false) |
GET /metrics - System Metrics
Prometheus-format metrics for system monitoring.
Request
Response
Returns Prometheus exposition format with route availability, service status, and timestamps.
Visualization Service Health
Check the health of the visualization generation service (EnhancedKGVisualizer).
GET /api/visualizations/health
Response
Health Status Interpretation
Status Values
| Status | Meaning | Action Required |
|---|---|---|
healthy | All systems operational | None - system is working normally |
degraded | Some services impaired | Monitor closely, investigate non-critical issues |
down | Critical failure | Immediate action required |
Service-Specific Status
| Service | Critical? | Impact if Down |
|---|---|---|
| Neo4j | Yes | No KG queries or ingestion possible |
| Qdrant | Yes | No vector search or retrieval |
| SQL Database | Yes | No metadata or source tracking |
| Redis | No | Reduced performance (no caching) |
| Visualization | No | Visualizations unavailable |
Monitoring Integration
Prometheus Metrics Export
Docker Health Check
Kubernetes Liveness Probe
Best Practices
- Regular checks: Poll health endpoints every 30-60 seconds
- Timeout handling: Set appropriate timeouts (5-10 seconds)
- Retry logic: Implement exponential backoff for transient failures
- Alerting: Set up alerts for degraded/down status
- Dashboard: Display real-time health status in admin UI
- Logging: Log all health check failures for debugging
- Graceful degradation: Handle partial service failures gracefully