Structured JSON logs · 30-day retention
ShippedEvery container deploy emits JSON-structured logs (level, timestamp, request-id, span-id). 30-day retention by default. Pro+ exports to your own sink (S3, Datadog, Splunk HEC, Elastic).
If you run an ops function — monitoring deploys, watching for anomalies, paging on incidents — this page is for you. TheoCloud ships structured logs, RED+USE+agent-specific metrics, configurable alerts, and integration with the SIEM/APM you already use.
Every container deploy emits JSON-structured logs (level, timestamp, request-id, span-id). 30-day retention by default. Pro+ exports to your own sink (S3, Datadog, Splunk HEC, Elastic).
Rate, Errors, Duration (RED) + Utilization, Saturation, Errors (USE) for compute, plus agent-specific spans (LLM call latency, tokens consumed, sub-agent fan-out). Available per environment and per deploy.
Every deploy is gated by readiness + liveness probes before traffic shifts. Probes are declared in theo.yaml (open format). Failed probes hold the deploy and surface diagnostic logs.
Configurable alerts on error rate spike, LLM token budget threshold, deploy failure, health probe failure. Delivery to email + webhook (Slack, Discord, custom). PagerDuty integration in Team tier.
Pre-built dashboards (Deploys, Errors, LLM spend, Latency p50/p95/p99). Custom dashboard builder in Team tier — drag-and-drop, shareable per-environment, embeddable in Confluence/Notion.
Auto-rollback on Sev-1 (error rate >5% sustained 3 min). Manual rollback always available via `theo rollback`. Incident timeline auto-generated. Post-mortem template per incident.
Log + metric export via API key
Log forwarding to your indexer
Log + metric forwarding
Long-term log archival
Incident routing (Team tier)
Alert webhook delivery
5-step deploy flow including observability stage (logs, metrics, agent-specific spans, `theo rollback`).
Public status page with per-surface state + GitHub incident log.
Custom dashboard builder ships in Team tier; pre-built dashboards (Deploys, Errors, LLM spend, Latency p50/p95/p99) are available on every tier. Incident workflow with auto-rollback ships on Enterprise contracts — request via platform@usetheo.dev.