SRE Copilot
AI observability chat assistant for root-cause analysis. Natural-language querying of metrics, logs and traces across Datadog APM, Kubernetes and PagerDuty.
available for senior SRE / platform / AI-infra roles · Mesa, AZ · remote
$ devops · sre · aiops engineer
10+ years building scalable, reliable cloud-native infrastructure and internal platforms — supporting 100M+ users. Now focused on AI infrastructure, AIOps, and MCP platforms that put LLMs to work in production SRE workflows.
Selected work · AI-Core / AIOps
Tools I built at Life360 to drive AI adoption across infrastructure — Eagle-Eye open-sourced as an MCP server with growing community traction (10+ stars).
AI observability chat assistant for root-cause analysis. Natural-language querying of metrics, logs and traces across Datadog APM, Kubernetes and PagerDuty.
Single-pane-of-glass for on-call engineers — unifies metrics, logs, alerts and debugging context so responders stop tab-hopping during incidents.
Self-hosted Grafana + OTEL stack to track Claude Code CLI cost and token
usage — uniquely captures skill/tool invocation analytics. One make up
spins up Prometheus, Grafana, Loki and an OTEL Collector with pre-built dashboards.
Local-only; no SaaS, no data egress.
Capacity-planning MCP over Datadog time-series + Facebook Prophet to forecast traffic spikes and proactively size CPU, memory and HPA for peak events.
Vibe-coding platform on top of the existing GitOps stack for one-click deploys of AI apps. Adopted by 10+ internal teams; biggest win in Ads & Marketing.
Career log
Cloud Success team — CI/CD, observability and release pipelines. Built Eagle-Eye (open-sourced MCP server, 10+ stars), a Prophet-based capacity-planning MCP and the SLICK platform; evaluated 30+ AI dev tools and provisioned a dedicated EKS cluster for MCP workloads.
Owned observability for 125+ k8s services supporting 100M+ users. Replaced logging agents with Vector (6TB+/day), ran Prometheus on EKS, led an OpenTelemetry PoC and eBPF service mapping.
Founding engineer at a pre-seed data platform. Built full AWS infra with Terraform/Terragrunt/Atlantis, multi-region EKS, ArgoCD GitOps and SLI/SLO frameworks.
End-to-end cloud & Kubernetes infrastructure for startups; DevOps advisory to 5+ SaaS portfolio companies under Together Fund.
Built the entire cloud + Kubernetes platform for a $210M neobank — 20+ microservices on multi-region EKS, CI/CD, GitOps and Prometheus/Grafana observability.
Delivered AWS cost optimizations saving $336K/yr for Careem Pay. Migrated production from Elastic Beanstalk to EKS; 24/7 on-call.
Ran large-scale CI/CD on Jenkins + Mesos; migrated payment workloads AWS→Azure for ~20% infra savings.
Linux administration for 25+ global clients; handled security incidents — spam, spoofing and intrusion mitigation.
Toolchain
Let's talk
Open to senior SRE, platform and AI-infrastructure roles. The fastest way to reach me is email.