Senior HPC Cluster Engineer

Python

Client

We operate one of the largest GPU infrastructures in the world — 30,000+ GPUs and 10InfiniBand fabrics across five global data centers. Our infrastructure doubles in size everyyear. We’re looking for engineers who love getting deep into Linux systems, pushinghardware and software to their limits, and making the world’s fastest AI and HPC workloadsrun even faster

Why this role is exciting

You’ll join a small, senior team that works between the hardware and Linux OS layers, solving performance problems that affect tens of thousands of GPUs. This is hands-on, high-impact engineering where microsecond gains matter and every optimization is felt at globalscale.

What you’ll do

Profile and optimize Linux kernel subsystems (CPU scheduling, memorymanagement, networking stack) for GPU clusters and InfiniBand fabrics
Troubleshoot and resolve complex performance bottlenecks
Integrate and validate new GPU hardware (KVM/QEMU, PCIe devices, Kubernetes)
Improve monitoring, alerting, and automation for large-scale, distributed systems
Occasionally assist customers in optimizing workloads

We’d love to hear from you if you have

Solid Linux internals knowledge, ideally with kernel tuning or profiling experience(perf, ftrace, eBPF, sysprof, etc.)
Experience reading/debugging C or C++ system-level code
Scripting or development skills in Go, Python, or similar
A background in low-level, complex environments such as HPC, large-scale clusters,or high-performance networking

Bonus points for

GPU or HPC cluster experience
InfiniBand or other high-performance interconnect knowledge
Virtualization stacks (KVM/QEMU), Slurm, Kubernetes

This is for you if you

Love solving deep technical challenges, care about performance downto the microsecond, and want to work on infrastructure that pushes the limits of what’s possible

Location: Remote from anywhere in Europe
Salary: up to 160k + 25% bonus

Doghouse

Senior HPC Cluster Engineer

Interested in this job?

Apply Directly