The AI SRE

The AI SRE Built for the Unknown

Herald predicts issues before alert thresholds fire, investigates without runbooks, and resolves novel incidents.

70%+ accuracy on novel incidents

herald — zsh
GET /api/external-tools/{id}/config is returning null
Analyzing telemetry and recent deploys…

Found 2 likely root causes. Pick one to investigate:

  • Schema Drift high confidence

    JiraToolConfig was updated without a migration — GET returns schema 2.1 but callers expect 1.4, causing silent null failures on auth_method.

  • Stale Credential Cache medium confidence

    The external_tool_auth TTL dropped from 3600s to 300s in v2.3.1, so tokens may expire mid-request and produce the intermittent 401s on /api/external-tools/{id}/config.

↑↓ navigate · enter select

Real investigation on Herald production; sensitive details redacted. See more

Trusted in Production

Databricks
Anyscale
LlamaIndex
DataHub
Corelight
Snorkel
Monte Carlo
MotherDuck
Embrace
Eppo
Arize
DSPy

Why other AI SREs don't work

High Maintenance. Poor Coverage.

The metric stays below the alert threshold for most of the chart, then crosses it. Marker 1 identifies the threshold crossing. The area above the threshold after that crossing is labeled no runbook coverage, and marker 2 identifies the uncovered alert territory.

1 threshold model

Others Require Alert Thresholds

You have to instrument, tune thresholds for each data stream, and anticipate every failure mode worth watching. Miss one, and you're blind to it.

2 runbook coverage

Others Require Runbooks

You document every investigation workflow before it's needed. Maintain them as your stack evolves. When something novel breaks, there's no runbook and no investigation.

Every other AI SRE is purely reactive — and only handles failures someone already anticipated.

THE HERALD APPROACH

Stop reacting. Start preventing.

  1. Learn

    Herald builds a context graph before any alerts fire — observability, codebase, CI/CD, docs, and dependencies — so it knows what normal looks like and can work to solve any problem.

    context: Jira tool 65 · config read path · CUST-8291-X

  2. Detect

    No thresholds to set. Herald builds a custom anomaly detection model for each data stream and surfaces validated issues before your customers notice.

    validated signal: HTTP 500s rose to 18–26% over 22 minutes while other tenants stayed flat

  3. Investigate

    Never write another runbook. Herald evaluates multiple hypotheses simultaneously, each against the right data source, and delivers RCAs in minutes.

    RCA: Schema Drift · legacy Vault keys rejected after PR #4275

In Production

Results. Delivered Fast.

Heralds's agent onboards and adapts quickly. Gartner's 2026 AI SRE Market Guide identifies proactive incident prevention and contextual awareness as next-generation capabilities. Herald already does both.

  • Results in days, not months. The Herald agent learns your stack quickly and efficiently – see your first RCA in days.
  • Solves the unknown. 70%+ accuracy on novel incidents for one of the world's biggest B2B2C platforms.
  • Never repeats mistakes. Herald learns from every single investigation, so it never makes the same mistake twice.

Powered by UC Berkeley research

Herald was founded by PhDs and Professors from UC Berkeley's innovation center, RISELab, combining expertise in AI, LLMs, data systems, and scalable infrastructure.

Evaluating an AI SRE?

One question matters

What's your agent's accuracy on novel incidents?

Book a Demo