New offer - be the first one to apply!

June 8, 2026

DevOps Engineer

Mid • Remote

Łódź, Poland

About the role

We are looking for a DevOps Engineer to help build and operate automation, deployment, and reliability standards for large-scale GPU infrastructure used for AI training and inference workloads.

In this role, you will work on software-defined infrastructure supporting GPU clusters, high-performance networking, storage platforms, and internal AI services. This is a hands-on position for someone who is comfortable working close to infrastructure, improving operational processes, and building reliable automation in a complex technical environment.

Responsibilities

  • Design, implement, and maintain Infrastructure as Code solutions for provisioning and managing bare-metal GPU servers, networking, storage, and cluster orchestration components

  • Build and improve CI/CD pipelines for infrastructure, platform services, and internal tooling

  • Develop and maintain monitoring, logging, alerting, and observability solutions for large-scale GPU environments

  • Support reliability initiatives by defining and tracking SLIs/SLOs, automating incident response, and contributing to post-incident analysis

  • Automate operational tasks such as cluster scaling, firmware and BIOS updates, hardware validation, diagnostics, and capacity planning

  • Work closely with Infrastructure, Networking, Facilities, and AI/ML teams to ensure stable and scalable platform operations

  • Support DevSecOps practices, including infrastructure hardening, vulnerability management, and compliance automation

  • Identify repetitive manual work and replace it with efficient automation

  • Evaluate new tools and solutions related to GPU infrastructure, orchestration, and cloud-native operations

Requirements

  • 4–7 years of experience in DevOps, SRE, Platform Engineering, or a similar role

  • Strong practical experience with infrastructure automation in complex production environments

  • Good hands-on knowledge of Terraform, Ansible, or similar Infrastructure as Code tools

  • Experience building and maintaining CI/CD pipelines and working with GitOps practices

  • Good understanding of infrastructure security, vulnerability management, and security best practices

  • Experience with security tools such as Snyk, CrowdStrike, or similar solutions

  • Practical experience with Kubernetes

  • Experience working with GPU-related technologies such as NVIDIA GPU Operator, device plugins, MIG, or time-slicing

  • Good scripting or programming skills in Python, Go, or Bash

  • Experience with bare-metal provisioning, low-level infrastructure automation, or data center operations

  • Good knowledge of observability tools such as Prometheus, Grafana, Loki, and OpenTelemetry

  • Ability to work independently, prioritize tasks, and communicate effectively with technical teams

  • English proficiency at least at a communicative level is required, as you will be working in an international team

Nice to have

  • Experience in AI infrastructure, HPC environments, hyperscale infrastructure, or data center operations

  • Familiarity with orchestration and scheduling tools such as Slurm, Ray, Run:ai, KServe, or Kubernetes-based schedulers

  • Experience integrating telemetry from power, cooling, or environmental systems

  • Experience building internal platforms or self-service tools for engineering teams

  • Understanding of compliance and audit requirements in security-sensitive environments

What we offer

  • Benefits package

  • Opportunity to work on advanced infrastructure supporting large-scale AI workloads

  • Real impact on the reliability and scalability of next-generation compute environments

  • Collaboration with experienced engineers across infrastructure, platform, and AI domains

  • A fast-moving environment with space for ownership, technical input, and professional growth