available for senior SRE / platform / AI-infra roles · Mesa, AZ · remote

Neel Thomas_

$ devops · sre · aiops engineer

10+ years building scalable, reliable cloud-native infrastructure and internal platforms — supporting 100M+ users. Now focused on AI infrastructure, AIOps, and MCP platforms that put LLMs to work in production SRE workflows.

10+
years in DevOps / SRE
100M+
users supported
6TB+
logs / day pipelined
125+
k8s services observed

Selected work · AI-Core / AIOps

Putting LLMs to work in production SRE

Tools I built at Life360 to drive AI adoption across infrastructure — Eagle-Eye open-sourced as an MCP server with growing community traction (10+ stars).

open source · MCP server

SRE Copilot

AI observability chat assistant for root-cause analysis. Natural-language querying of metrics, logs and traces across Datadog APM, Kubernetes and PagerDuty.

Claude SonnetLangGraph DatadogKubernetesMCP
open source · MCP server · 10+ ★

Eagle-Eye

Single-pane-of-glass for on-call engineers — unifies metrics, logs, alerts and debugging context so responders stop tab-hopping during incidents.

ObservabilityMCP Incident responseOn-call
open source · observability

Claude Code Observability

Self-hosted Grafana + OTEL stack to track Claude Code CLI cost and token usage — uniquely captures skill/tool invocation analytics. One make up spins up Prometheus, Grafana, Loki and an OTEL Collector with pre-built dashboards. Local-only; no SaaS, no data egress.

GrafanaOpenTelemetry PrometheusLokiDocker
capacity planning

Forecasting MCP

Capacity-planning MCP over Datadog time-series + Facebook Prophet to forecast traffic spikes and proactively size CPU, memory and HPA for peak events.

ProphetTime-series HPAForecasting
internal platform

SLICK — Streamlit One-Click

Vibe-coding platform on top of the existing GitOps stack for one-click deploys of AI apps. Adopted by 10+ internal teams; biggest win in Ads & Marketing.

StreamlitGitOps Databricks MCPFunction calling

Career log

Experience

Sr Software Engineer, DevOps · Life360 Aug 2025 — Present

Cloud Success team — CI/CD, observability and release pipelines. Built Eagle-Eye (open-sourced MCP server, 10+ stars), a Prophet-based capacity-planning MCP and the SLICK platform; evaluated 30+ AI dev tools and provisioned a dedicated EKS cluster for MCP workloads.

Sr Software Engineer, Network Operations · Life360 Jun 2023 — Jul 2025

Owned observability for 125+ k8s services supporting 100M+ users. Replaced logging agents with Vector (6TB+/day), ran Prometheus on EKS, led an OpenTelemetry PoC and eBPF service mapping.

DevOps Engineer · Dview.io Nov 2022 — May 2023

Founding engineer at a pre-seed data platform. Built full AWS infra with Terraform/Terragrunt/Atlantis, multi-region EKS, ArgoCD GitOps and SLI/SLO frameworks.

DevOps Consultant · Freelance May 2022 — Nov 2022

End-to-end cloud & Kubernetes infrastructure for startups; DevOps advisory to 5+ SaaS portfolio companies under Together Fund.

Site Reliability Engineer · Zolve Apr 2021 — May 2022

Built the entire cloud + Kubernetes platform for a $210M neobank — 20+ microservices on multi-region EKS, CI/CD, GitOps and Prometheus/Grafana observability.

Site Reliability Engineer · Careem (an Uber company) Sep 2019 — Sep 2020

Delivered AWS cost optimizations saving $336K/yr for Careem Pay. Migrated production from Elastic Beanstalk to EKS; 24/7 on-call.

DevOps Engineer · Ola Cabs Oct 2017 — Aug 2019

Ran large-scale CI/CD on Jenkins + Mesos; migrated payment workloads AWS→Azure for ~20% infra savings.

Linux System Engineer · Poornam Info Vision May 2015 — Sep 2017

Linux administration for 25+ global clients; handled security incidents — spam, spoofing and intrusion mitigation.

Toolchain

Skills

Languages & IaC

GoBashTerraform TerragruntCloudFormation

Cloud & Kubernetes

AWSAzureEKS ArgoCDArgo RolloutsKEDA KyvernoIstioKarpenter

Observability

PrometheusGrafanaDatadog New RelicVectorOpenTelemetry eBPF

AI Infrastructure

MCPClaudeLangGraph RAGProphetLangfuse Streamlit

Let's talk

Building reliable systems —
now with AI in the loop.

Open to senior SRE, platform and AI-infrastructure roles. The fastest way to reach me is email.