OBSERVABILITY

Prometheus & Grafana: Kubernetes Monitoring

A production guide to Prometheus and Grafana in 2026 — the pull-based architecture and TSDB, PromQL, node and kube-state-metrics exporters, service discovery, recording and alerting rules, Alertmanager, Grafana dashboards, the kube-prometheus-stack Helm chart, remote-write with Thanos and Mimir, and OpenTelemetry. Based on monitoring a GKE cluster running 26 microservices.

By Jose Nobile | Updated 2026-07-01 | 16 min read

Prometheus Architecture & TSDB

Prometheus is a pull-based monitoring system: the server periodically scrapes HTTP /metrics endpoints exposed by your applications and by exporters, rather than having agents push data to it. Each scrape returns metrics in the simple Prometheus text (or OpenMetrics) exposition format — one line per time series, with a metric name, a set of key/value labels, and a float value. This pull model makes targets self-describing and lets Prometheus detect a down target simply because a scrape fails, giving you a built-in up metric for free.

A time series is uniquely identified by its metric name plus the full set of label key/value pairs. That label set is the core data model: http_requests_total{method="GET",status="200",pod="api-7f9"} is a different series from the same metric with status="500". Prometheus 3.x is the current major line (3.0 landed in November 2024; the 3.x series ships on a roughly six-week cadence, with a designated 3.x LTS for conservative deployments). Prometheus 3 brought a rebuilt query UI, native UTF-8 metric and label names, a native OTLP receiver, and Remote Write 2.0.

Samples land in the local TSDB, a purpose-built time-series store. Incoming data is buffered in an in-memory head block and a write-ahead log (WAL) for crash safety, then compacted into immutable 2-hour blocks on disk, which are progressively merged into larger blocks. Retention is bounded by --storage.tsdb.retention.time (default 15 days) or by size. Because the local TSDB is not clustered, the standard pattern for scale and durability is a pair of identical Prometheus servers for HA plus remote_write to a long-term backend such as Thanos or Grafana Mimir (covered below).

# Minimal prometheus.yml: global settings + one static scrape target
global:
  scrape_interval: 15s     # how often to scrape targets
  evaluation_interval: 15s # how often to evaluate rules
  external_labels:
    cluster: gke-prod
    region: us-central1

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ["localhost:9090"]

PromQL: The Query Language

PromQL is the functional query language you use to slice metrics, build alerts, and drive Grafana panels. An instant vector selector like http_requests_total{job="api"} returns one sample per matching series at a single instant; a range vector like http_requests_total{job="api"}[5m] returns all samples in a time window and is what most functions operate on. Label matchers support equality (=), negation (!=), and regex (=~, !~).

The single most important function is rate(). Counters only ever increase (until a reset to zero on restart), so you almost never graph them raw — you compute a per-second average rate over a window: rate(http_requests_total[5m]). Use rate() for alerting and slow-moving graphs, and irate() only for fast, volatile dashboards. Always pick a range at least four times your scrape interval so each evaluation sees enough samples. To turn per-series rates into a service-level number, wrap them in an aggregation: sum(rate(...)) by (status).

For latency, applications expose histograms as _bucket, _sum, and _count series. Compute a quantile with histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)). Prometheus 3 also ships native histograms, a far more efficient exponential-bucket representation queried with the same function family. The example below is a classic RED (Rate, Errors, Duration) error-ratio expression — the backbone of most SLO alerts.

# Per-second request rate, summed by HTTP status
sum by (status) (
  rate(http_requests_total{job="api"}[5m])
)

# Error ratio (5xx as a fraction of all requests) over 5m
sum(rate(http_requests_total{job="api",status=~"5.."}[5m]))
  /
sum(rate(http_requests_total{job="api"}[5m]))

# p99 request latency from a histogram
histogram_quantile(
  0.99,
  sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
)
# Aggregation, filtering and prediction
# Top 5 pods by memory in a namespace
topk(5,
  sum by (pod) (
    container_memory_working_set_bytes{namespace="production"}
  )
)

# Available memory below 10% right now
(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) < 0.10

# Will this filesystem fill within 4 hours? (linear prediction)
predict_linear(node_filesystem_avail_bytes{mountpoint="/"}[6h], 4*3600) < 0

Exporters: node & kube-state-metrics

Prometheus only knows how to scrape a /metrics endpoint, so anything that does not natively expose Prometheus metrics needs an exporter — a small process that translates some system's state into the exposition format. In Kubernetes, two exporters do most of the heavy lifting: node_exporter for machine-level metrics and kube-state-metrics (KSM) for the state of Kubernetes objects. Instrument your own apps directly with an official client library (Go, Java, Python, Node.js, Rust) rather than an exporter.

node_exporter runs as a DaemonSet (one Pod per node) and exposes hardware and OS metrics: CPU per mode (node_cpu_seconds_total), memory (node_memory_MemAvailable_bytes), disk, filesystem fill (node_filesystem_avail_bytes), and network. It reads from the host's /proc and /sys, so its Pod mounts those read-only. These are the metrics behind node CPU/memory/disk dashboards and capacity alerts.

kube-state-metrics is different: it does not report resource usage, it reports the desired vs actual state of API objects. It watches the API server and emits series like kube_deployment_status_replicas_available, kube_pod_status_phase, kube_pod_container_status_restarts_total, and kube_node_status_condition. Join KSM state metrics with cAdvisor usage metrics (exposed by the kubelet) to answer questions like “which Deployments are running fewer replicas than requested?” Both exporters ship pre-wired in the kube-prometheus-stack, so you rarely deploy them by hand — but you must understand which metric comes from where.

# node_exporter as a DaemonSet (abridged) + a ServiceMonitor
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring
spec:
  selector:
    matchLabels: {app: node-exporter}
  template:
    metadata:
      labels: {app: node-exporter}
    spec:
      hostNetwork: true
      hostPID: true
      containers:
      - name: node-exporter
        image: quay.io/prometheus/node-exporter:v1.9.1
        args: ["--path.rootfs=/host"]
        ports: [{containerPort: 9100, name: metrics}]
        volumeMounts:
        - {name: rootfs, mountPath: /host, readOnly: true}
      volumes:
      - {name: rootfs, hostPath: {path: /}}

Service Discovery & Scrape Config

In a cluster where Pods come and go every minute, hard-coding scrape targets is hopeless. Prometheus solves this with service discovery: it queries the Kubernetes API and automatically maintains the list of targets. The kubernetes_sd_config supports several roles — node, endpoints, service, pod, and endpointslice — each returning a stream of targets decorated with discovery metadata as __meta_* labels.

Those raw discovery labels are shaped into your final target set with relabeling. relabel_configs run before the scrape and decide which targets to keep, which port and path to hit, and which labels to attach; metric_relabel_configs run after the scrape and can drop or rewrite individual noisy series before they hit the TSDB. The common convention is to only scrape Pods that carry an annotation such as prometheus.io/scrape: "true", reading the port and path from sibling annotations.

When you run the Prometheus Operator (the engine inside kube-prometheus-stack) you rarely write raw scrape_configs at all. Instead you create ServiceMonitor and PodMonitor custom resources that select targets by label, and the Operator generates the underlying Prometheus config for you. This keeps scrape configuration declarative, versioned in Git, and owned by each team alongside their app manifests. The three roles below cover the vast majority of Kubernetes monitoring.

ROLE

pod / endpoints

Discover application targets. The endpoints role scrapes the Pods behind a Service; the pod role scrapes every matching Pod directly. Metadata exposes namespace, labels, and annotations for relabeling.

ROLE

node

Discover cluster nodes. Used to scrape the kubelet, cAdvisor, and the node_exporter DaemonSet. One target per node, addressed via the node's internal IP or the kubelet API.

CRD

ServiceMonitor / PodMonitor

Prometheus Operator custom resources that replace hand-written scrape configs. Select targets by label; the Operator renders the Prometheus config. Declarative, GitOps-friendly, per-team ownership.

# Scrape only annotated Pods, via relabeling
scrape_configs:
  - job_name: kubernetes-pods
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      # keep only pods with prometheus.io/scrape: "true"
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: "true"
      # use the annotated path (default /metrics)
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      # copy namespace and pod name onto every series
      - source_labels: [__meta_kubernetes_namespace]
        target_label: namespace
      - source_labels: [__meta_kubernetes_pod_name]
        target_label: pod

Recording & Alerting Rules

Rules are PromQL expressions that Prometheus evaluates on a schedule (evaluation_interval). There are two kinds. Recording rules pre-compute an expensive or frequently-used query and save the result as a new time series, so dashboards and alerts read a cheap pre-aggregated metric instead of recomputing a heavy expression on every refresh. By convention their names use a colon, e.g. job:http_requests:rate5m, which distinguishes them from raw exporter metrics.

Alerting rules fire when an expression returns results. The expr defines the condition, for requires it to stay true for a duration before firing (this suppresses flapping on transient spikes), labels attach severity and routing keys, and annotations carry human-readable text with templating. A good alert is symptom-based and actionable: alert on user-facing SLO burn (error ratio, latency) rather than on every CPU blip. Prometheus only decides when an alert is firing; it then forwards it to Alertmanager, which decides who gets paged.

In production, a burn-rate strategy on the error-ratio recording rule pages the on-call engineer only when the SLO is genuinely at risk: a fast-burn alert (2% budget in 1 hour) triggers a page, while a slow-burn alert (10% budget in 6 hours) opens a ticket. This multi-window, multi-burn-rate approach — straight from the Google SRE workbook — cut alert fatigue dramatically compared with a naive “error rate > 1%” threshold.

# rules.yml: one recording rule + one alerting rule
groups:
  - name: http-slo
    interval: 30s
    rules:
      # RECORDING: pre-compute the 5m error ratio per job
      - record: job:http_errors:ratio5m
        expr: |
          sum by (job) (rate(http_requests_total{status=~"5.."}[5m]))
            /
          sum by (job) (rate(http_requests_total[5m]))
      # ALERTING: fast-burn page when the ratio stays high for 5m
      - alert: HighErrorRatio
        expr: job:http_errors:ratio5m > 0.05
        for: 5m
        labels: {severity: page}
        annotations:
          summary: "High 5xx ratio on {{ $labels.job }}"
          description: "{{ $value | humanizePercentage }} of requests are failing."

Alertmanager: Routing & Notifications

Alertmanager is a separate binary that receives firing alerts from one or more Prometheus servers and turns them into notifications. It handles the messy human side of alerting that Prometheus deliberately leaves out: grouping related alerts into a single notification, deduplication across HA Prometheus pairs, inhibition (suppress a warning when a related critical alert is already firing), silencing (mute alerts during a maintenance window), and delivery to receivers like PagerDuty, Opsgenie, Slack, email, or a generic webhook.

Configuration centers on a routing tree. The route block has a top-level default receiver and nested child routes that match on alert labels — so severity: page alerts go to PagerDuty while severity: ticket alerts open a Jira issue. Grouping keys (group_by) collapse many alerts sharing the same cluster/service into one message; group_wait, group_interval, and repeat_interval control the timing of first, follow-up, and re-notification messages.

Run Alertmanager as a small HA cluster (two or three replicas that gossip over a mesh) so a single node failure never drops a page. In the kube-prometheus-stack, Alertmanager is a first-class custom resource and its config lives in an AlertmanagerConfig CRD, letting each namespace own its own routing and receivers. Route by the same severity label your alerting rules set, and keep inhibit_rules so a “whole cluster down” critical does not bury you in a hundred per-service warnings.

# alertmanager.yml: routing tree + inhibition
route:
  receiver: slack-default
  group_by: ["alertname", "cluster", "namespace"]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  routes:
    - matchers: ['severity="page"']
      receiver: pagerduty-oncall
      continue: false

receivers:
  - name: slack-default
    slack_configs:
      - channel: "#alerts"
        send_resolved: true
  - name: pagerduty-oncall
    pagerduty_configs:
      - routing_key: <integration-key>

# silence warnings when a critical for the same service is firing
inhibit_rules:
  - source_matchers: ['severity="page"']
    target_matchers: ['severity="warning"']
    equal: ["namespace", "alertname"]

Grafana: Dashboards & Data Sources

Grafana is the visualization layer. It queries Prometheus (and dozens of other backends) and renders panels — time series, stat, gauge, table, heatmap, and the newer trend and canvas panels. Grafana 12 (the current 12.x line, with Grafana 13 released mid-2026) shipped a redesigned dashboard editing experience, dashboard schema v2, and Git Sync so dashboards can live in a repository as code. You add Prometheus as a data source by URL; in-cluster that is typically http://prometheus-server.monitoring.svc:80.

Do not build dashboards by clicking forever. Use template variables — a $namespace or $pod variable populated by a label_values() query — so one dashboard works across every namespace and pod. Reference the variable directly in PromQL (rate(http_requests_total{namespace="$namespace"}[5m])). Provision dashboards and data sources declaratively from ConfigMaps or files so they are reproducible; in the kube-prometheus-stack a dashboard is just a ConfigMap with a grafana_dashboard: "1" label that the Grafana sidecar auto-loads.

For the actual panels, lean on community dashboards as a starting point — the Node Exporter Full dashboard, the Kubernetes / Compute Resources set, and the Alertmanager overview are battle-tested — then trim them to the handful of RED and USE (Utilization, Saturation, Errors) panels your team actually reads during an incident. Grafana also has its own unified alerting engine that can evaluate rules against any data source, but many teams keep alert evaluation in Prometheus (for a single source of truth) and use Grafana purely for visualization.

# Provision the Prometheus data source (datasources.yaml)
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus-server.monitoring.svc:80
    isDefault: true
    jsonData:
      httpMethod: POST
      timeInterval: 15s  # match your scrape_interval
      prometheusType: Prometheus

kube-prometheus-stack Helm Chart

Nobody assembles a Kubernetes monitoring stack component-by-component in 2026. The kube-prometheus-stack Helm chart, maintained by the prometheus-community org, bundles everything into one install: the Prometheus Operator, Prometheus itself, Alertmanager, Grafana, node_exporter, kube-state-metrics, and a large set of curated default dashboards and alerting rules for cluster health. The chart is at major version 87.x and is distributed both from the traditional Helm repo and as an OCI artifact at oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack.

The key idea is the Prometheus Operator. Instead of editing prometheus.yml by hand, you manage everything through custom resources: a Prometheus CR defines the server, ServiceMonitor/PodMonitor select scrape targets, PrometheusRule holds recording and alerting rules, and Alertmanager/AlertmanagerConfig manage notifications. The Operator watches these objects and reconciles the running deployment — a fully GitOps-native, declarative model that fits Argo CD or Flux perfectly.

Install with a values override rather than defaults: pin persistent-volume sizes and retention, set resource requests/limits for Prometheus (memory scales with active series — budget roughly a few KB per series), enable persistence for Grafana, and turn on remoteWrite when you add long-term storage. A crucial gotcha: this chart installs CRDs, and Helm does not upgrade CRDs automatically on helm upgrade — apply the new CRDs from the release before bumping the chart across a major version.

# Install kube-prometheus-stack with a values override
helm repo add prometheus-community \
  https://prometheus-community.github.io/helm-charts
helm repo update

helm install kps prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  -f values.yaml

# values.yaml (excerpt)
prometheus:
  prometheusSpec:
    retention: 15d
    replicas: 2         # HA pair
    resources:
      requests: {cpu: "1", memory: 4Gi}
      limits:   {memory: 8Gi}
    storageSpec:
      volumeClaimTemplate:
        spec:
          resources: {requests: {storage: 100Gi}}
    # scrape ServiceMonitors from all namespaces
    serviceMonitorSelectorNilUsesHelmValues: false
grafana:
  enabled: true
  persistence: {enabled: true, size: 10Gi}

Remote-Write, Long-Term Storage & OpenTelemetry

A single Prometheus server's local TSDB is not replicated and is capped at a few weeks of retention. For durable, multi-cluster, long-retention metrics you use remote_write: Prometheus keeps scraping and rule-evaluating locally, but also streams every sample to a remote backend. Prometheus 3 speaks Remote Write 2.0, a more compact protobuf format that carries metadata, native histograms, and exemplars alongside samples. The two dominant open-source backends are Thanos and Grafana Mimir.

Thanos takes the lowest-friction path: a sidecar next to each Prometheus uploads its TSDB blocks to object storage (S3/GCS), and a Thanos Querier fans out over all Prometheus instances plus the object-store Store Gateway to give you a single global, de-duplicated query view with downsampling for cheap long-range graphs. It is at the 0.4x series on a six-week cadence. Grafana Mimir takes the opposite approach: a horizontally scalable, multi-tenant microservices backend (its stable line is 3.x) that Prometheus pushes to via remote-write. Choose Thanos when you already run Prometheus and want the simplest bolt-on; choose Mimir when you need hard multi-tenant isolation and are serving hundreds of teams with a dedicated platform team.

OpenTelemetry ties the ecosystem together. Prometheus 3 ships a native OTLP receiver — enable --web.enable-otlp-receiver and applications can push OTLP metrics to /api/v1/otlp/v1/metrics instead of exposing a scrape endpoint. Going the other way, the OpenTelemetry Collector's prometheusremotewrite exporter forwards OTLP metrics straight into any Remote-Write backend (Prometheus, Thanos, Mimir), and its prometheus receiver can scrape existing Prometheus targets. A common 2026 pattern is a Collector as the ingestion front door for traces, logs, and metrics, remote-writing metrics to Mimir while your Prometheus servers keep doing rule evaluation and alerting.

# Prometheus remote_write to a long-term backend (Thanos Receive / Mimir)
remote_write:
  - url: https://mimir.example.com/api/v1/push
    headers:
      X-Scope-OrgID: team-a  # Mimir tenant
    queue_config:
      max_samples_per_send: 2000
      capacity: 10000
      max_shards: 30

# --- OpenTelemetry Collector: OTLP in, remote-write out ---
receivers:
  otlp:
    protocols: {grpc: {}, http: {}}
exporters:
  prometheusremotewrite:
    endpoint: https://mimir.example.com/api/v1/push
    headers: {X-Scope-OrgID: team-a}
service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [prometheusremotewrite]

Real-World: Production Monitoring Stack

The platform monitors a GKE cluster running 26 microservices with the kube-prometheus-stack: an HA pair of Prometheus servers scraping via ServiceMonitors, node_exporter and kube-state-metrics for infrastructure state, Alertmanager routing to PagerDuty and Slack, and Grafana for dashboards. Metrics remote-write to a long-term backend for 13-month retention, and SLO burn-rate alerts page on-call only when the error budget is genuinely at risk.

SLO Burn-Rate Alerting

Multi-window, multi-burn-rate alerts on recording-rule error ratios. A fast-burn alert pages on-call; a slow-burn alert opens a ticket. Alert fatigue dropped sharply versus naive static thresholds.

Declarative & GitOps

Every ServiceMonitor, PrometheusRule, and Grafana dashboard lives in Git and is reconciled by the Prometheus Operator via Argo CD. No hand-edited prometheus.yml, fully reproducible across clusters.

Long-Term Storage

Remote-write ships samples to object-store-backed long-term storage with downsampling, giving 13-month global queries for capacity planning while local Prometheus keeps only 15 days for fast recent lookups.

Latest Features (2025-2026)

Prometheus 3.x is the current major line: Prometheus 3.0 arrived in November 2024 — the biggest release in years — and the 3.x series has continued on a roughly six-week cadence through 2026 (3.10 landed in February 2026 with removable service discoveries via Go build tags and a distroless image variant). Headline changes over the 2.x era: a completely rebuilt query and TSDB UI, native UTF-8 support in metric and label names, a built-in OTLP receiver, Remote Write 2.0, and stable native histograms. A designated 3.x LTS line exists for teams that prefer slower, patch-only upgrades.

Native histograms: Traditional Prometheus histograms require you to pre-choose fixed buckets, which is both lossy and expensive in series count. Native histograms (promoted through the 3.x line) use an exponential bucketing scheme stored as a single sample, giving high-resolution latency distributions at a fraction of the cardinality. You query them with the same histogram_quantile() family, and remote-write 2.0 carries them end-to-end into Thanos and Mimir.

OTLP ingestion and OpenTelemetry convergence: Prometheus now natively accepts OpenTelemetry metrics over OTLP (enable --web.enable-otlp-receiver; endpoint /api/v1/otlp/v1/metrics), and the ecosystem has settled on a clean division of labor — the OpenTelemetry Collector as the vendor-neutral ingestion and processing layer, Prometheus/Mimir/Thanos as the metrics store and query engine, and Grafana as the UI. UTF-8 metric names in Prometheus 3 remove the last friction point when mapping OTLP names (dots) onto Prometheus.

Grafana 12 and 13: Grafana 12 (current 12.x) shifted dashboards toward being managed as code: dashboard schema v2, Git Sync to store dashboards in a repository, and a reworked editing experience, alongside continued investment in the unified alerting engine and Grafana Alloy (the OpenTelemetry-based collector distribution that supersedes the Grafana Agent). Grafana 13 shipped in mid-2026. Pin your deployment to a supported release and treat dashboards as versioned artifacts rather than click-ops.

kube-prometheus-stack at 87.x, OCI-first: The de-facto Kubernetes install is now published both to the classic Helm repo and as an OCI artifact at oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack, tracking recent Prometheus, Grafana, and Prometheus Operator releases. Grafana Labs also shipped v4 of its separate Kubernetes Monitoring Helm chart in 2026, making exporter deployment (node_exporter, kube-state-metrics) explicit so you can point at existing instances instead of silently duplicating them.

Long-term storage maturity: Thanos continues its six-week release cadence in the 0.4x series with steady query-performance and object-store improvements, while Grafana Mimir reached its 3.x stable line (3.0 in November 2025) with stronger multi-tenancy and simpler operation. Both consume Prometheus Remote Write 2.0 and native histograms, so the choice is now mostly operational: bolt-on simplicity (Thanos) versus scalable multi-tenant platform (Mimir). VictoriaMetrics remains a popular drop-in alternative where raw ingest efficiency is the priority.

3.x

Prometheus 3 & Native Histograms

Rebuilt UI, UTF-8 metric names, Remote Write 2.0, and stable native histograms — high-resolution latency at a fraction of the series cardinality.

OTLP

Native OTLP Receiver

Prometheus ingests OpenTelemetry metrics directly via --web.enable-otlp-receiver. Collector-in, Prometheus-store, Grafana-UI is the settled 2026 pattern.

GRAFANA

Grafana 12 / 13

Dashboards as code: schema v2, Git Sync, reworked editor, unified alerting, and Grafana Alloy (OTel collector) superseding the Grafana Agent.

HELM

kube-prometheus-stack 87.x

OCI-first distribution via ghcr.io. Bundles Operator, Prometheus, Alertmanager, Grafana, node_exporter, and kube-state-metrics with curated rules.

STORAGE

Thanos 0.4x & Mimir 3.x

Both consume Remote Write 2.0 and native histograms. Thanos = bolt-on simplicity; Mimir = scalable multi-tenant platform. VictoriaMetrics a strong alternative.

More Guides