SRE Copilot
AI observability assistant for root-cause analysis. Ask in plain language and it queries metrics, logs and traces across Datadog APM, Kubernetes and PagerDuty.
devops · sre · aiops engineer
I keep large-scale, cloud-native systems reliable — and I'm now putting LLMs to work inside production SRE workflows through MCP and AIOps.
Components · AI-Core / AIOps
Tools I built at Life360 to drive AI adoption across infrastructure — Eagle-Eye is open-sourced as an MCP server with growing community traction.
AI observability assistant for root-cause analysis. Ask in plain language and it queries metrics, logs and traces across Datadog APM, Kubernetes and PagerDuty.
A single pane of glass for on-call engineers — unifies metrics, logs, alerts and debugging context so responders stop tab-hopping during incidents.
Self-hosted Grafana + OTEL stack that tracks Claude Code CLI cost and token usage — and uniquely captures skill/tool invocation analytics. One make up brings up Prometheus, Grafana, Loki and an OTEL Collector with ready-made dashboards. Local only — no SaaS, no data leaves your machine.
A capacity-planning MCP over Datadog time-series and Prophet that forecasts traffic spikes and sizes CPU, memory and HPA ahead of peak events.
A vibe-coding platform on top of the existing GitOps stack for one-click deploys of AI apps. Adopted by 10+ internal teams; biggest win in Ads & Marketing.
Uptime history · career log
Cloud Success team — CI/CD, observability and release pipelines. Built Eagle-Eye (open-sourced MCP server, 10+ stars), a Prophet-based capacity-planning MCP and the SLICK platform; evaluated 30+ AI dev tools and provisioned a dedicated EKS cluster for MCP workloads.
Owned observability for 125+ Kubernetes services supporting 100M+ users. Replaced logging agents with Vector (6TB+/day), ran Prometheus on EKS, and led an OpenTelemetry PoC and eBPF service mapping.
Founding engineer at a pre-seed data platform. Built full AWS infra with Terraform/Terragrunt/Atlantis, multi-region EKS, ArgoCD GitOps and SLI/SLO frameworks.
End-to-end cloud & Kubernetes infrastructure for startups; DevOps advisory to 5+ SaaS portfolio companies under Together Fund.
Built the entire cloud + Kubernetes platform for a $210M neobank — 20+ microservices on multi-region EKS, CI/CD, GitOps and Prometheus/Grafana observability.
Delivered AWS cost optimizations saving $336K/yr for Careem Pay. Migrated production from Elastic Beanstalk to EKS; 24/7 on-call.
Ran large-scale CI/CD on Jenkins + Mesos; migrated payment workloads AWS→Azure for ~20% infra savings.
Linux administration for 25+ global clients; handled security incidents — spam, spoofing and intrusion mitigation.
▮ operational — every role shipped and stayed up.
Monitored systems · toolchain
Get in touch
Open to senior SRE, platform and AI-infrastructure roles. Email is the fastest way to reach me.