System Administrator

    Site Reliability Engineer – Bare Metal Linux – Data Center – Networking – €180k

    Our client is building a cloud platform for high-throughput, compute-heavy workloads. They operate large-scale infrastructure where failure modes are real, capacity is finite, and reliability needs to be engineered, not "handled".

    We're seeking a Senior SRE to own end-to-end production reliability for our client: define SLIs/SLOs, run error budget conversations, and ship changes that reduce incidents and improve latency (p95/p99). You'll build automation to kill toil, improve deployment safety (canary/rollback), and turn observability into signal rather than noise.

    This is a bare-metal environment: think Linux, datacenters, physical fleets, and real hardware constraints, not managed services. You'll work close to the metal across Kubernetes internals (scheduling, autoscaling behavior, kubelet pressure/evictions, etcd/control plane), Linux performance (CPU/memory/IO contention), and network debugging (DNS/TCP/TLS, packet loss, congestion). On-call is part of the job, but success is measured by how much you reduce it.

    Requirements

    • Production Engineering experience running bare metal / on-prem / data center infrastructure (not public cloud only)

    • Deep hands-on expertise in Linux systems debugging and performance (CPU, memory, IO, kernel-level behaviors)

    • Strong understanding of networking (DNS/TCP/TLS, latency, packet loss, congestion, troubleshooting under load)

    • Strong Kubernetes experience beyond manifests: scheduler behavior, autoscaling edge cases, kubelet pressure/evictions, etcd/control plane

    • Experience with Terraform, Docker, Helm, and modern CI/CD practices

    • Coding skills in Go, and/or Python and/or C

    If you're looking for complexity and a new place to nerd out on infrastructure optimization, we'd love to hear from you.

    Location: Amsterdam – hybrid

    Total compensation: up to €180k