Back to Showcase

OpsConsole — Deployment Pipeline Observability Dashboard

A demo snapshot of an internal ops tool automating pipeline observability and on-call routing for small dev teams.

Industry: operations-toolingTeam Size: small-teamAI Agents: Claude Code, Cursor

Demo Snapshot

This is a curated demo snapshot. Real project data is reviewed before publishing.

PRD

Section Layout ↔ Industry-Standard PRD Template

Industry-Standard PRD SectionThis Fixture's SectionDepth Notes
Problem StatementProblem DefinitionSlack channel saturation; PagerDuty lacks unified context view
Success MetricsSuccess MetricsMedian MTTR 12 min to 4 min 30 s
Non-Functional RequirementsNon-Functional Requirementsviewer/operator/admin (3 roles); dashboard p95 under 3 s
(Domain-specific)On-Call Flowseverity, throttle (10 min), rotation, escalation (15 min)

Project Overview

OpsConsole is an internal operations tool that helps 5-20 engineer teams observe deploy pipeline events on a single dashboard and route alerts to the right on-call responder without drowning in Slack noise. As CI/CD runs grow to hundreds per day, shared Slack channels saturate with system logs and real outage signals get buried — OpsConsole targets that failure mode head-on. The product goal is a sub-5-minute MTTR from detect to first human ack; to hit that, severity classification, throttle windows, and escalation policies are first-class primitives, not afterthoughts. Vooster was used to structure the requirements and auto-distribute sprint tasks so that a single SRE plus a reviewer can ship and maintain the tool. It closes two gaps — pipeline visibility and alert fidelity — without introducing a separate observability SaaS (Datadog, NewRelic). The Vooster-generated PRD and task tree were fed directly into Claude Code, shipping webhook ingestion, the throttle engine, and the Slack template service in 2-3 day cycles each. After one month in production, median MTTR dropped from 12 minutes to 4 minutes 30 seconds, making OpsConsole the "fastest ROI internal tool" on the team's list.

Problem Statement

Growth-stage startup ops teams cannot separate signal from noise when deploy events spike, and the incident response lag grows accordingly. The incumbent options fall short in three distinct ways. First, raw Slack webhook integrations mix successful builds with failures, so severity is indistinguishable at a glance. Second, stand-alone PagerDuty can page the on-call but cannot show "what broke, in which environment, and how many times in a row" in a single dashboard view. Third, log viewers like Grafana or Loki lack a semantic classification of pipeline events (build/test/deploy), forcing each team to rebuild a dashboard from scratch. Fourth, Slack App workflows on their own cannot accumulate or query state such as throttle and escalation, which makes post-incident retros impossible. In practice, real outage notifications arrive 7-10 minutes late on average, and alert floods from repeat errors train responders to tune out notifications entirely. OpsConsole fuses "semantic event classification + severity-based throttling + time-aware routing" into a single workflow that unifies the dashboard with paging. The guiding principle is that an on-call engineer should be able to answer "what is broken right now, and who does it belong to" in under five seconds. On top of that, every alert has a permanent history so retros can examine the bottlenecks the next week.

Target Personas

Persona A — On-call SRE / Platform Engineer (alias: Hyunwoo Jung)

Role: Platform team engineer at a 10-15 engineer org, weekly night on-call rotation Context: Must respond instantly to Slack alerts outside business hours and decide severity within the first five minutes Daily Pain:

Persona B — Deploy-approving Tech Lead (alias: Sejung Yoon)

Role: Product team tech lead, reviews 40-70 PRs per week and approves prod deploys Context: Watches success/failure signals for the ten minutes after a deploy and moves on Daily Pain:

Persona C — QA / Release Manager (alias: Soomin Han)

Role: Release manager at a mid-market B2B SaaS team, owns the twice-weekly prod release cadence Context: Watches pipeline state around the release window and coordinates hotfix flows on failure Daily Pain:

User Stories

Core Feature Specs

F1. Webhook event ingestion & classification

Purpose: Ingest GitHub Actions / GitLab CI webhooks and auto-classify build/test/deploy events by environment (staging/prod) Behavior:

F2. Severity-based throttle & escalation engine

Purpose: Damp repeat alert floods and route unacknowledged pages to a secondary responder Behavior:

F3. Slack alert template & deploy-diff context

Purpose: Provide Block Kit message templates by environment and severity plus immediate deploy-diff context Behavior:

F4. Role-based dashboard & weekly report

Purpose: Provide viewer/operator/admin access control and weekly aggregate reports for operational transparency Behavior:

F5. Audit log & change history viewer

Purpose: Persist routing/alert-template/rotation changes as diffs so admins can look them up retrospectively Behavior:

Success Metrics

Each metric is aggregated in real time on the internal dashboard and auto-exports into the weekly retro packet.

Non-Functional Requirements

On-Call Flow & Alert Routing

OpsConsole's on-call and routing pipeline runs through four discrete stages: severity classification → throttle window → rotation match → escalation. Every stage is independently configurable, and edits are recorded as diffs in the audit log so retros can trace back changes over time.

Scope Boundaries

V1 excludes the following.

Tech Stack & Architecture

The constraining assumption is the "internal tool" positioning — a single org is assumed, so multi-tenancy is an org-ID partition rather than full isolation. Offering this as an external SaaS would require a separate architecture pass. Incident automation itself (auto-rollback, auto-recovery playbooks) is explicitly out of scope for this phase; OpsConsole concentrates on the three axes of observe, route, and record.

태스크 트리

Sprint 1 — Ingestion & Observability

Pipeline event ingestion and base dashboard

Sprint 2 — On-call & Alerts

On-call routing rule engine and Slack alert templates

Discord
OpsConsole — Deployment Pipeline Observability Dashboard