all systems operational / open to senior SRE · platform · AI-infra roles Mesa, AZ · remote · 2015 → now

Neel Thomas Thelly

devops · sre · aiops engineer

I keep large-scale, cloud-native systems reliable — and I'm now putting LLMs to work inside production SRE workflows through MCP and AIOps.

View résumé ↓ Download PDF GitHub ↗ LinkedIn ↗

Components · AI-Core / AIOps

Putting LLMs to work in production SRE workflows

Tools I built at Life360 to drive AI adoption across infrastructure — Eagle-Eye is open-sourced as an MCP server with growing community traction.

01 open source · MCP server

SRE Copilot

AI observability assistant for root-cause analysis. Ask in plain language and it queries metrics, logs and traces across Datadog APM, Kubernetes and PagerDuty.

Claude SonnetLangGraphDatadogKubernetesMCP

↗

02 open source · MCP server · 10+ ★

Eagle-Eye

A single pane of glass for on-call engineers — unifies metrics, logs, alerts and debugging context so responders stop tab-hopping during incidents.

ObservabilityMCPIncident responseOn-call

↗

03 open source · observability

Claude Code Observability

Self-hosted Grafana + OTEL stack that tracks Claude Code CLI cost and token usage — and uniquely captures skill/tool invocation analytics. One make up brings up Prometheus, Grafana, Loki and an OTEL Collector with ready-made dashboards. Local only — no SaaS, no data leaves your machine.

GrafanaOpenTelemetryPrometheusLokiDocker

↗

04 capacity planning

Forecasting MCP

A capacity-planning MCP over Datadog time-series and Prophet that forecasts traffic spikes and sizes CPU, memory and HPA ahead of peak events.

ProphetTime-seriesHPAForecasting

05 internal platform

SLICK — Streamlit One-Click

A vibe-coding platform on top of the existing GitOps stack for one-click deploys of AI apps. Adopted by 10+ internal teams; biggest win in Ads & Marketing.

StreamlitGitOpsDatabricks MCPFunction calling

Uptime history · career log

A decade keeping systems up

Aug 2025 — Present

Sr Software Engineer, DevOps · Life360

Cloud Success team — CI/CD, observability and release pipelines. Built Eagle-Eye (open-sourced MCP server, 10+ stars), a Prophet-based capacity-planning MCP and the SLICK platform; evaluated 30+ AI dev tools and provisioned a dedicated EKS cluster for MCP workloads.

Jun 2023 — Jul 2025

Sr Software Engineer, Network Operations · Life360

Owned observability for 125+ Kubernetes services supporting 100M+ users. Replaced logging agents with Vector (6TB+/day), ran Prometheus on EKS, and led an OpenTelemetry PoC and eBPF service mapping.

Nov 2022 — May 2023

DevOps Engineer · Dview.io

Founding engineer at a pre-seed data platform. Built full AWS infra with Terraform/Terragrunt/Atlantis, multi-region EKS, ArgoCD GitOps and SLI/SLO frameworks.

May 2022 — Nov 2022

DevOps Consultant · Freelance

End-to-end cloud & Kubernetes infrastructure for startups; DevOps advisory to 5+ SaaS portfolio companies under Together Fund.

Apr 2021 — May 2022

Site Reliability Engineer · Zolve

Built the entire cloud + Kubernetes platform for a $210M neobank — 20+ microservices on multi-region EKS, CI/CD, GitOps and Prometheus/Grafana observability.

Sep 2019 — Sep 2020

Site Reliability Engineer · Careem (an Uber company)

Delivered AWS cost optimizations saving $336K/yr for Careem Pay. Migrated production from Elastic Beanstalk to EKS; 24/7 on-call.

Oct 2017 — Aug 2019

DevOps Engineer · Ola Cabs

Ran large-scale CI/CD on Jenkins + Mesos; migrated payment workloads AWS→Azure for ~20% infra savings.

May 2015 — Sep 2017

Linux System Engineer · Poornam Info Vision

Linux administration for 25+ global clients; handled security incidents — spam, spoofing and intrusion mitigation.

▮ operational — every role shipped and stayed up.

Monitored systems · toolchain

Skills

Languages & IaC

GoBashTerraformTerragruntCloudFormation

Cloud & Kubernetes

AWSAzureEKSArgoCDArgo RolloutsKEDAKyvernoIstioKarpenter

Observability

PrometheusGrafanaDatadogNew RelicVectorOpenTelemetryeBPF

AI Infrastructure

MCPClaudeLangGraphRAGProphetLangfuseStreamlit

Get in touch

Reliable systems —
now with AI in the loop.

Open to senior SRE, platform and AI-infrastructure roles. Email is the fastest way to reach me.

[email protected] View résumé ↓ Download PDF GitHub ↗