December 17, 2025

Senior Site Reliability Engineer

Senior • Remote

$180 - $210/

Warsaw, Poland

Our Client is an international organization developing a modern, highly available digital platform used by millions of users.

The project focuses on building and maintaining scalable cloud infrastructure, automating processes, improving reliability, and implementing Site Reliability Engineering (SRE) best practices.

We are looking for an experienced Senior Site Reliability Engineer who will take ownership of production environments, enhance observability, and automate the entire application lifecycle.

WORK MODE 100% remote

RESPONSIBILITIES

Designing, implementing, and scaling resilient infrastructure in AWS (multiple accounts, production and pre-production environments)
Maintaining and evolving Kubernetes (EKS) environments using Helm, ArgoCD, and Terraform, ensuring predictable and auditable deployment processes
Collaborating with product and platform teams on SRE best practices (SLIs/SLOs, error budgets, reliability reviews)
Building and improving observability using Dynatrace, Grafana, cloud-native metrics, and open-source tools
Optimizing Cloudflare configuration (WAF, cache and routing rules, perimeter security) to improve performance and security
Automating infrastructure, deployments, and routine tasks using GitHub Actions, Python, and Bash
Participating in incident response, leading post-mortems, and turning lessons learned into tangible improvements

REQUIREMENTS

Minimum 5 years of experience in an SRE/DevOps role in AWS-based production environments (AWS preferred, Azure acceptable)
Strong proficiency with Terraform, Helm, ArgoCD, and GitHub Actions
Excellent knowledge of Kubernetes (EKS) – autoscaling, rollout strategies, troubleshooting, cluster architecture
Experience building and maintaining observability pipelines (logs, metrics, traces, SLIs/SLOs, alerting)
Ability to design high-availability and fault-tolerant systems
Solid understanding of CI/CD principles and GitOps practices
Experience with Cloudflare (DNS, CDN, WAF, rulesets)
Hands-on experience with monitoring tools such as Dynatrace, Prometheus, and Grafana
Very good command of English (collaboration with teams in Europe and the US)
Experience in incident response: on-call rotations, RCA, post-mortems

Nice to have

Examples of improvements introduced in the areas of SLO/SLI management or alert fatigue reduction
Contributions to automation or observability tooling
Experience leading reliability reviews and promoting a post-mortem culture
Interest in resilience engineering and knowledge sharing within the SRE community

WHY JOIN?

Stable, long-term B2B cooperation directly with the end client
Work on high-scale projects with real impact on a platform used by millions of users
Full technical autonomy with real influence over architecture, solutions, and reliability standards
100% remote work, flexible hours, and an async-friendly environment
Mature engineering culture, partnership-based collaboration, and teamwork with experts from Europe and the US
Access to a modern tech stack: AWS, EKS, Terraform, ArgoCD, Cloudflare, Dynatrace, and cloud-native tools

TQLO Sp. z o.o. – Employment Agency (KRAZ No. 33580)

Thank you for all applications. We will contact selected candidates.

TQLO SPÓŁKA Z OGRANICZONĄ ODPOWIEDZIALNOŚCIĄ

TQLO SPÓŁKA Z OGRANICZONĄ ODPOWIEDZIALNOŚCIĄ is a rapidly growing technology company specializing in advanced AI solutions. The company is known for its innovative approach and commitment to developing scalable and efficient backend services, particularly in cloud environments. TQLO values high performance, reliability, and security in its applications, and fosters a work culture focused on partnership and engineering quality. The company collaborates closely with teams in the USA, emphasizing a global perspective and cross-border teamwork. As an employment agency, TQLO is dedicated to long-term, stable collaborations, offering a flexible and remote work environment that supports modern technological advancements.