Raoul Mahtani

SRE @ Flatiron Health

Senior Site Reliability Engineer

Raoul Mahtani

I build reliable infrastructure, automate operational toil, and design systems that scale.

Currently a Senior Site Reliability Engineer at Flatiron Health, where I focus on infrastructure automation, observability, and reliability engineering for healthcare systems. Outside of work I build tools, experiment with distributed systems, and write about reliability engineering and system design.

About

I'm a Site Reliability Engineer focused on making complex systems predictable, observable, and easy to operate.

My work sits at the intersection of infrastructure automation, distributed systems, and reliability engineering. I care about designing systems that are not only scalable, but also operationally simple and resilient under failure.

Professionally, I've spent the last several years working on:

  • Infrastructure automation with Terraform
  • Cloud architecture on AWS
  • Reliability engineering using SLOs and observability
  • Reducing operational toil through automation

At Flatiron Health I've worked on modernizing infrastructure, improving reliability, and reducing operational overhead across production environments. Outside of my day job I build side projects focused on distributed systems, developer tooling, and automation.

Work Experience

2022 – Present

Senior Site Reliability Engineer

Flatiron Health

  • Led a migration from Ansible-based provisioning to Terraform-based Infrastructure as Code, modularizing infrastructure so entire client environments can be provisioned in minutes instead of weeks.
  • Helped transition a critical healthcare application from monolithic infrastructure to a load-balanced architecture, dramatically improving system reliability.
  • Implemented infrastructure optimization initiatives that reduced cloud spend by over $1M annually through right-sizing and usage analysis.
  • Built automation and observability tooling to reduce operational toil and improve system visibility across production environments.

Jun 2017 – May 2022

Site Reliability Engineer

BlackRock

  • Authored deployment playbooks and infrastructure automation using HashiCorp tooling (Terraform, Packer, Vault).
  • Built Kafka consumers to surface real-time platform metrics for operational visibility.
  • Delivered automation across Node.js/Cassandra/Angular services, reducing manual intervention in the deployment pipeline.

Jul 2014 – May 2017

GTI Rotational Analyst

BlackRock

  • Administered enterprise Windows infrastructure and led storage zoning projects.
  • Ran disaster recovery rehearsals for the East Coast data center.

Projects

Daedalus — Distributed Systems Project

A distributed agent platform designed for large-scale infrastructure introspection and automation. The system allows a central control node to query and orchestrate thousands of agents across an infrastructure environment.

Key capabilities

  • Horizontal scalability across thousands of nodes
  • Secure agent registration using API key authentication
  • Distributed command execution across infrastructure fleets
  • Centralized query execution with aggregated results
  • Web interface for issuing queries and monitoring node status
  • Structured response collection and searchable history

Use cases

  • Infrastructure inventory
  • Fleet-wide diagnostics
  • Operational automation
  • Distributed command execution

Future iterations will include natural language queries translated into system commands using open-source language models.

Engineering Philosophy

The most valuable infrastructure systems share a few traits.

Automation first

If something must be done more than once, it should be automated.

Observability over guesswork

Reliable systems require good telemetry: logs, metrics, traces, and meaningful alerts.

Operational simplicity

The best systems are not the most complex ones — they're the ones that are easiest to run in production.

Failure is inevitable

Systems should be designed assuming things will break.

Writing

All posts →

I write occasionally about reliability engineering, distributed systems, infrastructure automation, and building engineering tools. Posts can be found in the blog section of this site.

Feb 13, 2026

Launching this space

First notes on what I’m building here—an introductory post to the new home for my ideas.

Read →

Resume

Full work history and technical background available as a PDF.

Download Resume

Personal

I grew up in Jamaica and have always been interested in how complex systems work — whether that's infrastructure, software, or organizations. Outside of engineering I enjoy traveling, fitness, writing, and exploring new places and ideas.

Contact

If you'd like to connect, collaborate, or discuss engineering ideas, feel free to reach out.

LinkedIn →