Webinar On Demand

Context Engineering for Self-Healing AI SRE

View a Complimentary Live Webinar Presented by Komodor

In this webinar, we’ll trace our own reliability journey - from reactive incident chaos to data-driven prevention and, ultimately, AI-powered self-healing. After analyzing over a million real production incidents, we hit the predictability paradox: why repeatable failures still catch teams off guard if most Kubernetes outages follow recognizable patterns that we can systematically address?

We discovered the undeniable truth that in modern sprawling Cloud-Native infrastructures, no two issues are the same, and none exist in isolation. Deterministic approaches break at a certain scale, and AI agents can’t replace humans by executing a simple runbook. We’ll review the 6 main categories of failures, how the same error can have different root causes, why the same fix doesn’t always apply, and how to provide AI agents with the right context to achieve human-level reasoning during RCA.

We’ll conclude with a forward-looking view of AI agents as reliability partners, a short demo, and a set of immediate, actionable steps attendees can take to reduce toil and begin building toward autonomous, self-healing operations.

komodor logo

Asaf Savich

AI Engineering Group Manager, Komodor

Speaker

Asaf is an Engineering Group Leader at Komodor, where his daily quest is to convince Kubernetes to behave (and, on good days, actually listen). Before that, he co-founded Genie AI and served as Director of Engineering at Kubiya.ai and Mend.io — building AI-powered DevOps tools, scaling teams from scratch, and shipping cloud-native systems that don’t just run, but run reliably. He’s especially passionate about using AI to give operators superpowers, flipping “firefighting” into “fire prevention”

Search

Context Engineering for Self-Healing AI SRE

View a Complimentary Live Webinar Presented by Komodor

Asaf Savich

Speaker

Stay Connected with the Linux Foundation